Theoretical Foundations of Data-Driven Stochastic Modelling with Financial Market Applications

Date: November 13, 2024

The scope of the proposed project is to lay the theoretical foundations of mathematical data-driven modelling in the contexts where processes modelled are primarily assumed to follow stochastic dynamics. Firstly, we describe what data-driven modelling techniques and methodologies we plan to integrate in the context of broader computational modelling techniques, secondly, define the mathematical frameworks where such techniques are to be integrated by properly defining the random nature of the processes modelled, lastly, present the case for financial markets as the primary domain of application of techniques described, particularly, sustainable finance.

A growing effort in the literature has concerned the application of machine learning techniques in the task of mathematical modelling. There is an increasing need to bridge the gap between tested and validated modelling frameworks and the integration of machine learning techniques and approaches in modelling frameworks where analogous dynamics of processes are assumed. This proposal aims to add to the literature in modelling contexts where processes modelled are assumed to follow stochastic dynamics. All modelling approaches considered will be analysed from a robust statistical perspective, ensuring results are tested and assumption scrutinized.

Machine Learning techniques which in the literature may be capsulated under the umbrella terms of Statistical Learning[1], Probabilistic Machine Learning[2], Deep Learning[3], Transformers[4], neural ODEs and SDEs[5], Neural Operators, PINN[6], KAN[7] and PIKAN[8] have gained a lot of attention and publicity recently due to their use in a variety of innovative technological applications from self-driving cars to the use in LLMs. These techniques and methodologies have been primarily defined and developed from a computer science perspective; however, their adoption may prove to be a valuable tool not only for computer scientists. To this extent, there is the need properly define in what specific contexts and under what modelling assumptions and considerations they may add marginal value.

The primary context of interest detailed in this proposal are modelling environments where phenomena are assumed to be random processes. Random processes, also known as stochastic processes, are typically defined to be collection of random variables in a probability space where the indexing usually denotes a time component[9]. Stochastic was initially used as a word meaning "to conjecture" steaming from the Greek word "to aim" or "guess" in this context it pertains to intrinsic randomness. Individually, each random variable within the collection can assume values from a designated mathematical set, denoted as the state space. The change in value each random variable in the collection takes linearly with respect to its index is denoted an increment. Furthermore, a single outcome of a random process is denoted a realization.

Stochastic processes are typically classified based on characteristics such as state space, index set, or dependency relations among random variables. Standard categories include random walks, martingales, Markov, Lévy, and Gaussian processes, random fields, renewal, and branching processes[10]. The prevalence of stochastic processes is evident in their applications across numerous fields, encompassing biology, neuroscience, chemistry, physics, information theory, telecommunications, signal processing, and finance. Particularly, financial markets serve as a domain where the principles of stochastic theory find extensive applications.

This project seeks to bridge the gap between differing modelling frameworks, with a primary emphasis on mathematical finance, where asset prices are typically modelled as stochastic processes governed by Stochastic (Partial) (Backward) Differential Equations (S(P)(B)DEs)[11]. By integrating computational data-driven modelling techniques we aim to enhance the accuracy and robustness of financial modelling approaches. For this purpose, the integration of machine learning techniques may take the form of Deep Hedging[12], Deep Calibration, Generative Models for Forecasting Time-Series, Kernel Methods and Clustering for Portfolio Selection and Limit Order Book Modelling in the context of the Continuous Double Auction[13]. Classical modelling frameworks will be considered, that being for Time-Series Analysis, Model Calibration, Hedging and Forecasting. They will serve as a basis for theoretical considerations and as a benchmark in terms of performance. In terms of subspaces within financial contexts the proposal seeks to contribute primarily to the field of sustainable finance particularly of electricity dynamics modelling[14].

References

  1. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media. Link
  2. Murphy, K. P. (2022). Probabilistic Machine Learning: An Introduction. MIT Press. Link
  3. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444. Link
  4. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 5998-6008. Link
  5. Kidger, P., Morrill, J., Foster, J., & Lyons, T. (2020). Neural SDEs as Infinite-Dimensional GANs. arXiv preprint arXiv:2006.09375. Link
  6. Raissi, M., Perdikaris, P., & Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378, 686-707. Link
  7. Tegmark, M. (2024). KAN: Kolmogorov–Arnold Networks. arXiv preprint arXiv:2404.19756v1. Link
  8. Tegmark, M. (2023). Physics Informed Kolmogorov-Arnold Neural Networks for Dynamical Analysis via Efficent-KAN and WAV-KAN arXiv preprint arXiv:2407.18373. Link
  9. Øksendal, B. (2003). Stochastic differential equations. In Stochastic Differential Equations (pp. 65-84). Springer, Berlin, Heidelberg. Link
  10. Karatzas, I., & Shreve, S. (1998). Brownian motion and stochastic calculus (Vol. 113). Springer Science & Business Media. Link
  11. Black, F., & Scholes, M. (1973). The pricing of options and corporate liabilities. Journal of Political Economy, 81(3), 637-654. Link
  12. Buehler, H., Gonon, L., Teichmann, J., & Wood, B. (2019). Deep hedging. Quantitative Finance, 19(8), 1271-1291. Link
  13. Cont, R., Stoikov, S., & Talreja, R. (2010). A stochastic model for order book dynamics. Operations Research, 58(3), 549-563. Link
  14. Weron, R. (2014). Electricity price forecasting: A review of the state-of-the-art with a look into the future. International Journal of Forecasting, 30(4), 1030-1081. Link