Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

Econometric Machine Learning for better Heterogeneity Representation

Periodic Reporting for period 1 - Econ-ML (Econometric Machine Learning for better Heterogeneity Representation)

Reporting period: 2022-10-01 to 2024-09-30

The project "Econometric Machine Learning for Better Heterogeneity Representation (Econ-ML)" was motivated by the need for accurate, scalable, and interpretable behavioural models to understand and predict individual decision-making processes. This is particularly important in transportation and mobility, where understanding travellers’ behaviours is essential for designing sustainable and efficient systems. Traditional econometric approaches, such as Discrete Choice Models (DCMs), have long been used to model decision-making but face limitations when handling the complexity of large-scale data and capturing nuanced behavioural heterogeneity. Recent advances in machine learning (ML) offer promising solutions by enabling flexible, non-linear modelling and generative capabilities. However, these techniques often lack the interpretability and theoretical grounding that econometric models provide.
The project aimed to bridge this gap by combining the strengths of machine learning with the robust theoretical foundation of econometrics. Specifically, it sought to develop hybrid modelling frameworks that integrate ML techniques, such as Variational Autoencoders (VAEs), into econometric models like Latent Class Choice Models (LCCMs) and Mixed Logit Models. These hybrid approaches were designed to enhance traditional behavioural choice models by improving out-of-sample generalization, generating synthetic data, imputing missing data, providing a more accurate representation of heterogeneity, while maintaining interpretability consistent with economic theory.
By applying these advanced methodologies to real-world transport data, the project aimed to generate new insights into travellers’ behaviours and contribute to the broader field of transport modelling. The project’s goal was to improve the quality and scalability of decision-support tools for policymakers and transport planners. The developed models can be also applied beyond transportation, with potential applications in other fields such as marketing, finance, economics, healthcare, and environmental economics, where understanding and predicting human behaviour are equally critical.
The project involved several interconnected activities, which collectively advanced the state of discrete choice modelling.

- Hybrid Model Development: Two hybrid machine learning and discrete choice models were conceptualised and implemented. A Variational Autoencoder Latent Class Choice Model (VAE-LCCM) that integrates deep generative modelling with class-based segmentation, and a Variational Autoencoder Mixed Logit Model (VAE-MXL) that combines machine learning's generative power with the flexibility of Mixed Logit models. Results showed that both models can enhance traditional behavioural choice models by generating synthetic data, imputing missing data, and providing a more accurate representation of heterogeneity. Furthermore, they can improve the goodness-of-fit and out-of-sample generalisation of traditional discrete choice models while maintaining their behavioural and economic interpretability.

- Data Integration and Analysis: A large-scale smart card data was matched and integrated with a national travel survey data from Denmark to analyse and quantify reporting errors in travel surveys. This integration showcased the complementary nature of the two data sources, particularly in the context of large-scale public transport networks. It also provided valuable insights into improving the quality and reliability of travel survey data, which is essential for enhancing the performance of both econometric and machine learning models and contributing to more effective and informed decision-making in transport planning.
Another study compared traditional choice set generation methods with those derived from empirical smart card data in multimodal public transport networks. The findings showed the importance of using smart card data in multimodal public transport route choice models, which are important tools for policy simulation and planning.

- Computational Efforts: The project used Bayesian inference techniques and variational methods to overcome computational challenges associated with large-scale modelling.

- Dissemination: The project outputs were shared through peer-reviewed publications, international conference presentations, and seminars, reaching both academic and practitioner audiences.
The project delivered several results that advance the field of discrete choice modelling. By integrating the generative capabilities of machine learning with the interpretability of econometric models, the developed hybrid models can impute missing data, generate synthetic socio-economic profiles, and improve heterogeneity representation. These features can support better decision-making in transport planning, by offering better models to simulate policy scenarios and their effects on diverse demographic segments. They can also improve evidence-based policymaking and enhance the precision of demand forecasts. Moreover, the models are scalable and adaptable to other fields, such as marketing, finance, economics, healthcare, and environmental economics.
My booklet 0 0