Skip to main content

Parsimony and operator methods for treatment of endogeneity and multiple sources of unobserved heterogeneity

Periodic Report Summary 3 - POEMH (Parsimony and operator methods for treatment of endogeneity and multiple sources of unobserved heterogeneity)

The general purpose of this ERC funded research is to study two classes of large dimensional models : nonparametric and high-dimensional. The main reason for large dimensional models is that economic theory sometimes do not characterize functional forms, distributions of unobersables and their number, or the actual variables that have a direct effect on an outcome variable. Rather than using a model that would be simplistic, we allow for more flexible models which indeed are simple but without actually knowing the specific form of this simple relation.
An outcome of interest is usually modeled as depending on some observed and unobserved factors. These unobservables are often modelled, for convenience, as if a single variable was missing. However, this has implications which are often undesirable. We have studied in detail the inclusion of multiple unobservables in treatment effect models and shown that treatment effect parameters can also be recovered in this case while allowing for so called nonmonotonic selection into treatment. This means that some variables having an effect on selection into treatment can shift individuals from both nontreatement to treatement and some from treatment to non treatement. Importantly, it is not necessary to restrict the distribution of these unobservables. These unobservables can be cost factors in a model where individuals choose their level of education partly because they have an information set which allows them to forecast that investing into education will be benefic for them in the future. This literature relies however too often on the requirement that the explanatory variables were able to vary sufficiently. With Christophe Gaillac, we are able to allow for explanatory variables (eg cost shifters) which have much less variation and obtain estimators which are optimal and adapt to the unknown distribution of the unobservables.
A topic of huge importance in statistics over the last 15 years has been the estimation in models with many explanatory variables, possibly much more than the sample size, when many of them, which identity is unknown, do not have a direct effect (sparsity). With my coauthor Alexandre Tsybakov we have been the first to extend this litterature to many endogenous regressors, namely, regressors which are dependent on the unobservable error term but one has its disposal so-called instrumental variables, which is a classical subject of study in econometrics. We obtained confidence sets for the whole high-dimensional vector which adapt to the unknown number of regressors that actually have a direct effect, allow for some instrumental variables to have a direct effect (hence less moments than parameters) and be arbitrarily correlated with included endogenous variables (weak) and numerous. With my coauthor Christiern Rose, we have studied panel data models of networks with endogenous and exogenous variables and unobserved variables which are the same for all individuals. This could be viewed as a system with many equations with many endogenous variables. In this research we allow for cross equation restrictions, structured sparsity, and high-dimensional unobservables.