Skip to main content
Aller à la page d’accueil de la Commission européenne (s’ouvre dans une nouvelle fenêtre)
français français
CORDIS - Résultats de la recherche de l’UE
CORDIS

Data-Driven Agent-Based Models of Investors with Machine Learning

Periodic Reporting for period 1 - DataABM (Data-Driven Agent-Based Models of Investors with Machine Learning)

Période du rapport: 2023-09-01 au 2025-08-31

Financial markets are an exciting research subject with direct impact on the world around us. They also provide enormous amounts of highly detailed data. Such data is often subject to noise, errors, or missing information, but its quality is still much better than for most other non-experimental research fields. Nevertheless, limited experimental capabilities cannot easily be replaced by a lot of data. One way of dealing with this problem is the agent-based modeling, a bottom-up simulation approach, where simple programable agents were used to mimic financial markets. Agent-based modeling is simple, flexible, and allows to test all sorts of different settings. It also became more scalable once powerful computing power was at hand. There is, however, a question whether simple rule-based agents are realistic models of true investors.

Access to data from financial markets makes them a suitable application for rapidly developing methods of machine learning and artificial intelligence. However, bluntly applying models to data can be tricky and pose a challenge in terms of verification and interpretation of obtained results. At the same time, we observe how novel machine learning tools allow to improve the predictive power in finance beyond the old models.

The main objective of the project is to explore the two way benefits of combining agent-based modeling and machine learning in financial computing. On one hand, agent-based models can provide synthetic data with ground truths, which can be used to verify machine learning models. On the other hand, generative artificial intelligence models can be a way of providing more realistic agents, imitating closely true investors.
The first achievement, which was crucial for other objectives, is the simulation environment for agent-based models of the limit order book. The environment consists of full double-auction mechanism for placing, modifying and matching orders, and an engine, which controls the communication between the order book and the agents. Agents can be implemented in a simple and flexible manner, and can be use to represent other mechanism, not only individual investors. This makes the environment versatile and allows for far-reaching extensions. We already implemented multiple types of agents and simulation scenarios, as well as simple examples and tests for new users.

The above mentioned environment was used to build a simulation of heterogeneous agents, which was then used to test clustering methods used in the literature to find different types of investors based on real data. We have shown the limitations and robustness of the existing methods and proposed new features to improve the results.

We have built a completely novel agent-based model, focused on analysing the spread of information across investors. This is the first model of limit order book, which isolates the interaction between the agents from other market effects. This way, we were able to describe the effect of different interaction networks' types, and shown how scale free networks reproduce statistical properties of price dynamics, known as stylised facts.

Finally, we trained a generative artificial intelligence model on synthetic individual investor level data, and examined its properties. Having the ground truths from the model used in simulations, we were able to verify the validity of model's predicted conditional distributions describing investor's actions. This is the first study of this type in the financial computing literature.
The final results include: (i) simulation environment for agent-based models of the limit order book, (ii) experimental setting for clustering investors based on their behaviour, (iii) novel agent-based model of interacting investors, (iv) generative model capable of training on individual investor level data. All these achievements have potential for further development. In particular, joining the simulation environment with agents controlled by generative models trained on real, individual investor level data has a potential to be an impactful direction for both practitioners and regulators. It could allow for complex experiments and stress-testing different market scenarios. This in turn could be used for informing regulatory decisions, as well as building tools for fraud detection, or optimal market execution. A simplified scheme of such framework is shown in the attached plot.

What would be needed to achieve the full potential of these ideas is two fold. First, further research is needed. This includes optimising the existing environment for the usage of large generative models as agents, and extended work on different architectures used to build and train such models. Second, access to large datasets consisting of individual investor level data. Such data would need to have detailed information about the activities in the market, and would need to have high granularity, especially in terms of time precision. Finally, a collaboration with either the regulatory environment or interested commercial clients, would be needed to fully understand the needs and the expectations.
model.png
Mon livret 0 0