Skip to main content
European Commission logo print header

High-Dimensional Inference for Panel and Network Data

Periodic Reporting for period 2 - PANEDA (High-Dimensional Inference for Panel and Network Data)

Reporting period: 2021-01-01 to 2022-06-30

The classic data structures that are discussed extensively in the Statistics and Econometrics literature are either cross-sectional, time-series, or panel data (also called longitudinal data). The basic statistical methods that we use to analyze those data structures have been developed in the 20th century, driven by the availability of corresponding datasets. For example, the systematic collection of stock market data started in the early 20th century, accurate national accounts became increasingly available in the 1930s, microeconomic survey data have been collected systematically since the 1940s, and large longitudinal surveys of households were started in the 1960s.

The trend to increased data availability in the Social Sciences has accelerated in the past decades: Better computer and storage capabilities allow to record and manage much larger datasets and to access them more easily. We all create digital data footprints on a daily basis, from bank transactions to social network data. New machine learning methods allow quantifying everything from satellite images to legal documents, thus creating structured information out of unstructured raw data. And of course, there have been many conscious efforts of scientists and policymakers to collect larger and better datasets.

Those modern datasets often have a more complicated internal structure that cannot be accurately classified simply as cross-sectional data, time-series data, or panel data. In particular, those modern datasets often have a network structure, and the precision of statistical inference is often crucially linked to the structure of the underlying network. The main goal of this research project is to develop robust inference methods for such modern panel and network datasets. This requires establishing a mathematical representation of the network that allows formalizing the connection between the network and the precision of statistical inference. In addition, new bias correction and robust standard error estimation methods will be developed that account for the sparsity structure of the data. The new statistical methods developed in this project will help to analyze modern datasets in the social sciences more robustly and more reliably.
As part of this ERC project, five papers have already been published in leading Economic field journals, and one further paper has recently been accepted for publication. Three further working papers have been completed and submitted to journals. A software packed to help applied researchers implement the new statistical methods developed in this project has been written and disseminated, and further software packages of a similar kind are in preparation. Some of our novel inference methods are therefore already available and ready to use for applied Economists and other social scientists. Various presentations in scientific conferences and seminars have been given to disseminate the results of this ERC project further, including invited keynote presentations at the 2020 virtual conference "Modelling with Big Data and Machine Learning: Measuring Economic Instability" at the Bank of England, and the 2021 "International Panel Data Conference".
The working paper "Moment Conditions for Dynamic Panel Logit Models with Fixed Effects" by Bo Honore and Martin Weidner has received a particularly enthusiastic reception from the Econometric community, and a revision of this paper has been requested by the Review of Economic Studies, one of the top 5 general interest journals in Economics. The results of this paper were unexpected: We found novel moment conditions in a class of dynamic panel data models that were not expected to exist. Those novel moment conditions are useful for inference in this class of models, and they guide the way towards finding equally novel moment conditions in other models.

For the future of this ERC project, significant scientific progress is expected from combining ideas from the paper "Fixed-effect regressions on network data" by Koen Jochmans and Martin Weidner (Econometrica 2019) with the methods of the papers
"Minimizing Sensitivity to Model Misspecification" and "Posterior Average Effects" by Stephane Bonhomme and Martin Weidner (those papers were accepted for publication at Quantitative Economics and the Journal of Business & Economic Statistics). The latter two papers are about robustness towards model misspecification, but mismeasurement of nuisance parameters can also be interpreted as a type of model misspecification, which when combined with the first paper leads to a completely novel approach to tackle the high dimensional inference problems in the sparse network models that this ERC project is aimed at. This is work in progress, but it is one of the most exciting and most promising research directions to achieve the goals of this ERC project in the remaining years of this grant.