Periodic Reporting for period 1 - DataABM (Data-Driven Agent-Based Models of Investors with Machine Learning)
Okres sprawozdawczy: 2023-09-01 do 2025-08-31
Access to data from financial markets makes them a suitable application for rapidly developing methods of machine learning and artificial intelligence. However, bluntly applying models to data can be tricky and pose a challenge in terms of verification and interpretation of obtained results. At the same time, we observe how novel machine learning tools allow to improve the predictive power in finance beyond the old models.
The main objective of the project is to explore the two way benefits of combining agent-based modeling and machine learning in financial computing. On one hand, agent-based models can provide synthetic data with ground truths, which can be used to verify machine learning models. On the other hand, generative artificial intelligence models can be a way of providing more realistic agents, imitating closely true investors.
The above mentioned environment was used to build a simulation of heterogeneous agents, which was then used to test clustering methods used in the literature to find different types of investors based on real data. We have shown the limitations and robustness of the existing methods and proposed new features to improve the results.
We have built a completely novel agent-based model, focused on analysing the spread of information across investors. This is the first model of limit order book, which isolates the interaction between the agents from other market effects. This way, we were able to describe the effect of different interaction networks' types, and shown how scale free networks reproduce statistical properties of price dynamics, known as stylised facts.
Finally, we trained a generative artificial intelligence model on synthetic individual investor level data, and examined its properties. Having the ground truths from the model used in simulations, we were able to verify the validity of model's predicted conditional distributions describing investor's actions. This is the first study of this type in the financial computing literature.
What would be needed to achieve the full potential of these ideas is two fold. First, further research is needed. This includes optimising the existing environment for the usage of large generative models as agents, and extended work on different architectures used to build and train such models. Second, access to large datasets consisting of individual investor level data. Such data would need to have detailed information about the activities in the market, and would need to have high granularity, especially in terms of time precision. Finally, a collaboration with either the regulatory environment or interested commercial clients, would be needed to fully understand the needs and the expectations.