Structural Models for Text and other Unstructured Data

Informations projet

UnStruct

N° de convention de subvention: 864863

DOI

10.3030/864863

Date de signature de la CE 14 Mai 2020

Date de début 1 Septembre 2020

Date de fin 31 Août 2026

Financé au titre de

EXCELLENT SCIENCE - European Research Council (ERC)

Coût total

€ 1 648 551,00

Contribution de l’UE

€ 1 648 551,00

1 648 551,00

Coordonné par

UNIVERSITY COLLEGE LONDON
United Kingdom

Periodic Reporting for period 3 - UnStruct (Structural Models for Text and other Unstructured Data)

Période du rapport: 2022-11-01 au 2024-04-30

Society is increasingly awash with naturally occurring, unstructured data such as text, images, and payments. These data are not traditionally used in economic measurement which relies in large part on administrative records and surveys. However, the scale and richness of new data sources suggests they can add enormous value to how we measure the economy, make forecasts, and evaluate policy interventions. For example, statistical agencies typically publish national accounts with a delay and these accounts are not produced across granular spatial or temporal units. On the other hand, unstructured data is generated in real time and is large enough in scale to achieve large samples even in narrowly defined cells. What is currently lacking is a framework for taking unstructured data and converting it into economically interpretable measures. This is the main goal of the project, which will harness tools from machine learning and computer science to incorporate unstructured data into empirical economics.

One important example of the project's output is to build national accounts for consumption from the universe of payments made by a large private bank's (BBVA) retail clients in Spain. We propose a method for filtering raw payments data to generate a consumption panel that 1) is available in real time at daily frequency; 2) covers millions of individuals; 3) is disaggregated across consumption categories; 4) tracks consumption over many years; and 5) aggregates to the level of consumption reported by the Spanish statistical agency. This resource can be used to document patterns of inequality in the level of consumption across individuals and also which individuals face the most risk of consumption fluctuations. It can also be used as an input into economic models to study the effects of policy. In follow-up work, my coauthors and I use aggregate daily consumption measures to study how monetary policy pass through into the economy. We find that consumption reacts in a matter of days to changes in interest rates, an effect that traditional data cannot uncover. In future work, we will study how individual consumption reacts to shocks as well.

Another important example is the measurement of remote work offering from an enormous corpus of job postings from Lightcast. The shift to remote work since the pandemic has been one of the largest disruptions to the labor market since WWII. Survey responses take us some way in understanding the incidence of remote work, but limited samples create obstacles to, for example, reporting remote work levels and growth rates in cities and firms. My coauthors and I use a large-language model along with a set of human-labeled job postings to build a custom model that maps the text of any job ad into a classification of whether it offers remote work or not. We then apply it to hundreds of millions of job ads and document extensive heterogeneity in adoption. Data is available for download at https://wfhmap.com/ and can again be used to inform numerous crucial economic questions.

Other important contributions to date document how inflation uncertainty as measured from text leads monetary policymakers to adopt more aggressive policy stances, and how to measure whether social connections between individuals generates correlated outcomes. The project has also produced a summary of text-as-data methods for economists and example code for implementing a host of modern algorithms.

Economists typically use simple, transparent methods for organizing data. For example, counting keywords has traditionally been a favored approach to quantifying text. My project has shown that adopting more sophisticated approaches, especially when adapted to the relevant economic context, can greatly improve measurement. In the rest of the project, my team will continue to develop tools and work on problems such as how to identify markets from firm-to-firm transaction data and how to estimate the parameters of so-called "structural" macro models with non-traditional data.

Periodic Reporting for period 3 - UnStruct (Structural Models for Text and other Unstructured Data)

Partager cette page Partager cette page sur les réseaux sociaux

Télécharger Télécharger le contenu de la page