Skip to main content

Inference in Microeconometric Models

Periodic Reporting for period 4 - MiMo (Inference in Microeconometric Models)

Reporting period: 2020-09-01 to 2021-04-30

Unobserved differences between economic agents are an important driver behind the differences in their economic outcomes such as schooling decisions, wages, and employment durations. Allowing for such unobserved heterogeneity in economic modelling equips the specification with an additional dimension of realism but presents major challenges for econometric practice. Hence, reconciling heterogeneity in the data with econometric models is an issue of utmost importance.
The aim of this project is to develop inference methods for models with unobserved heterogeneity, with a focus on longitudinal (panel) data and network data.
A first block deals with inference in linear and nonlinear models and enhances the performance of statistical hypothesis tests. The literature often focuses on point estimation. However, it is statistical inference that accounts for uncertainty in the data and forms the basis for testing economic restrictions.
A second block makes progress on the estimation of models for network data. The importance of social and economic connections is well established but few formal results are available. We exploit the fact that network data can be seen as a type of panel data to derive such results.
A third block uses panel data to non-parametrically estimate dynamic discrete-choice models with unobserved type heterogeneity and/or latent state variables. Such results are useful because dynamic discrete-choice models are a workhorse tool in labour economics and industrial organization.
The performance of the tools will be assessed theoretically and via simulation, and they will be applied to various empirical problems.
Since the start of the project we have made progress on all fronts.

We have derived simple and powerful tests for serial correlation in short panel data. These tests are implemented in Stata. We have also obtained inference procedures for the distribution of heterogeneous parameters in (possibly nonlinear) panel data problems. This is useful as, in such settings, marginal effects are such heterogenous parameters. Next, we have developed a simple yet general approach to do inference in regression models with many controls. A leading case where this is important is in models for grouped data (such as panel data), where group specific variables are included almost by default to control for latent heterogeneity at the group level. Finally, we have also investigated the link between two seemingly independent approaches to inference in binary-choice models. Moreover, we have established that these conditions are, in fact, equivalent.

For data with a network structure we have obtained the following results. First, we have developed a simple yet quite general inference procedure for models for dyadic data based on a modified likelihood function. Examples here are friendship networks or bilateral trade networks. Next, we have provided a specific estimator for a model of network formation itself. This estimator is useful as it can deal with sparse networks, such as a friendship network, where most agents interact only with few others. We have also written Stata code for an attractive estimator of constant-elasticity models, such as the gravity equation in international trade with both importer and exporter effects. We have also derived conditions on the network structure for conventional inference procedures to be valid. These conditions concern, roughly, measures of sparsity. Our results show that in many fixed-effect models for linked data sets, such as student-teacher data or employer-employee data, the usual standard errors are highly unreliable.

For models with discrete unobserved heterogeneity, finally, we have developed improved estimation and inference procedures from multivariate data. Compared to our previous findings, here, some of the measurements are allowed to have very limited support, possibly binary.
Inference in grouped data is plagued by bias introduced by the presence of many group-specific parameters. We provide improvements both through the development of new point estimators as the construction of alternative standard errors. One substantial improvement on the state of the art in panel data is that we aim to estimate the entire distribution of marginal effects, whereas current results are limited to averages. We are currently working on relaxing some of the conditions underlying our results. Our new standard errors for regression models with many control variables can be applied in more general situations than existing alternatives. One example there are models with many dummy variables, which typically leads to a highly unbalanced regressor design.

Although there are many economic applications to network data there is very little theoretical work on how to perform valid inference in such settings, especially when the network is rather sparse. Our results give formal sufficient conditions for conventional inference to be valid in the linear regression model and provides easy-to-verify diagnostics. In a large student-teacher data set, for example, our results show that standard inference procedures dramatically overestimate the importance of teacher value-added to student achievement. We have also provided simple estimators for dyadic data and a model for network formation that allow to estimate models for which previously no estimator with attractive statistical properties was available. We are currently looking in to how to perform inference in a class of nonlinear models for sparse network data.