A coherent approach to analysing heterogeneity in network data

Informations projet

NETWORK

N° de convention de subvention: 101044319

DOI

10.3030/101044319

Date de signature de la CE 12 Juillet 2022

Date de début 1 Janvier 2023

Date de fin 31 Decembre 2027

Financé au titre de

European Research Council (ERC)

Coût total

€ 966 000,00

Contribution de l’UE

€ 966 000,00

966 000,00

Coordonné par

FONDATION JEAN JACQUES LAFFONT,TOULOUSE SCIENCES ECONOMIQUES
France

Periodic Reporting for period 1 - NETWORK (A coherent approach to analysing heterogeneity in network data)

Période du rapport: 2023-01-01 au 2025-06-30

The interaction between agents is central to economic activity. Such interaction naturally gives rise to network data, and substantial effort is being devoted to the development of econometric and statistical methods to analyze them. One fundamental issue is how to deal with (unobserved) heterogeneity among the agents in the network, that is, how differences across agents impact when they interact and how. Moreover, it is often of great interest to document the degree of heterogeneity, evaluate its impact, and uncover the existence and nature of any complementarities that may exist between the agents. Tools to perform such decompositions, with certified theoretical guarantees would be an important and useful part of the toolkit of the applied economist. In the current literature they are, however, in limited supply.
The aim of NETWORK is to provide a coherent approach to the formulation of models for network interaction in the presence of unobserved heterogeneity, to present identification results, and to propose computationally convenient estimators based on them.

In a first paper we have established new identification and estimation results for a generalized version of the stochastic block model. Here, nodes belong to one of a finite number of latent communities and the placement of edges between them and any weight assigned to these depend on the communities to which the nodes belong. The identification argument is constructive, and we present a computationally attractive nonparametric estimator based on it. Limit theory is derived under asymptotics where we observe a growing number of independent networks of a fixed size. The results of a series of numerical experiments are reported on.
In a second paper we have derived an identification result for general dyadic data under the assumption that they are jointly exchangeable and dissociated. In this case they admit a non-separable specification with two-way unobserved heterogeneity. We provide conditions under which both the distribution of the observed random variables conditional on the unit-specific heterogeneity and the distribution of the unit-specific heterogeneity itself are uniquely recoverable from knowledge of the joint marginal distribution of the observable random variables alone without imposing parametric restrictions.
In a third paper we look at a general class of problems with unobserved heterogeneity. A popular approach to perform inference in the presence of nuisance parameters is to construct estimating equations that are orthogonal to the nuisance parameters, in the sense that their expected first derivative is zero. Such first-order orthogonalization may, however, not suffice when the nuisance parameters are very imprecisely estimated. Leading examples where this is the case are models for panel and network data that feature fixed effects. We show how, in the conditional-likelihood setting, estimating equations can be constructed that are orthogonal to any chosen order. Combining these equations with sample splitting yields higher-order bias-corrected estimators. In an empirical application we apply our method to a fixed-effect model of team production and obtain estimates of complementarity in production and impacts of counterfactual re-allocations.
In a fourth paper we look at the judge-leniency design, which is very popular in causal inference. Evaluating whether conventional inference procedures apply to it is not immediate. We frame such a design as an inference problem from grouped data in a setting with a growing number of groups and limited variation between groups. Such an approximation is well suited for the data sets encountered in practice. The two-stage least-squares estimator should never be used. The jackknife instrumental-variable estimator presents a reliable tool for inference, provided that a non-standard variance estimator is used. Conventional decision rules to gauge instrument strength should not be used. An alternative such decision rule is provided in this context and evaluated.
In a fifth paper we explore the possibility of the bootstrap to provide correct inference in longitudinal data when dynamics and fixed effects are present. The paper shows that the bias arising from unrestricted feedback can be correctly replicated by a panel data version of the moving block bootstrap.
In a sixth paper we consider the problem of nonparametrically identifying models for worker/firm data in the presence of both unobserved worker and firm heterogeneity. This is a first-order concern. Indeed, this type of model is a workhorse tool in the earnings literature but identification results in the presence of two-sided unobserved heterogeneity are non-existent. We provide such results for several model specifications.

The work performed to date has advanced the econometric literature on models for dyadic interaction by giving new identification results and new estimators for weighted versions of the classic stochastic block model. It has also produced conditions under which a continuous version of such model is identifiable. Here, the setup corresponds essentially to a nonparametric two-way model with continuous latent heterogeneity. Extensions are possible to hypergraphs, where we look at interaction between multiple agents (i.e. groups of individuals) such as team production.
One interesting aspect of the derived results is that they are useful not only in settings where the data consists of a single large network, but also where one has access to data on many small networks. In such a case, alternative approaches based on say spectral clustering or brute-force fixed-effect estimation are not viable options. Given the prevalence of stochastic block models in applied work these results should be of much use.
We have also derived the first known results on the nonparametric identifiability of models for matched data (usually worker/firm) with two-sided random effects.
NETWORK has also produced a novel way to deal with estimation noise caused by many imprecisely estimated nuisance parameters in inference on a target parameter of interest. In the context of the project the chief application lies in models for network data with node-specific heterogeneity, but the approach is general and contributed to the large literature on inference in the presence of nuisance parameters. Indeed, the project has generalized the classic notion of Neyman orthogonality and, importantly, provides a way to achieve it in general likelihood-based models. Such a result is new and is expected to generate much additional research in the future.

Periodic Reporting for period 1 - NETWORK (A coherent approach to analysing heterogeneity in network data)

Télécharger Télécharger le contenu de la page