Skip to main content
European Commission logo print header

Addressing productivity paradox with big data: implications to policy making

Article Category

Article available in the following languages:

Big data approach could shed light on Europe’s productivity paradox

To improve Europe’s productivity, we need a richer understanding of what drives it. BIGPROD has ‘web-scraped’ company information to create unique data sets, to reveal much about the reality of company activities and outputs.

Digital Economy icon Digital Economy
Society icon Society

Despite steady technological advancement across several sectors, work productivity across Europe has remained relatively stagnant or even decreased over recent years. This is known as the ‘productivity paradox’. The Crépon-Duguet-Mairesse model (CDM) has provided a seminal framework for investigating productivity as it relates to innovation output and research investment. Making its own unique contribution to the CDM literature, the EU-supported BIGPROD (Addressing productivity paradox with big data: implications to policy making) project has used the model to analyse data about investment in intangibles, interfirm spillovers and individual innovations. BIGPROD ‘web-scraped’ data directly from company websites to identify what these companies considered their innovation outcomes, such as new products or services, alongside any collaborations they had undertaken. The project-designed platform, which hosts the web scraping programme, collected data from a Europe-wide sample of approximately 180 000 European companies. The initial proof of concept phase of the project has tested the quality and quantity of the data retrievable from the sample. “While we are still performing analysis, we know that we retrieved meaningful data from roughly 60 % of our sample. While this might seem limited, compared to a typical survey response rate of around 20 % it’s actually good,” says project coordinator Arho Suominen from the VTT Technical Research Centre of Finland.

Web scraping

The project’s sample target was 160 000 to 200 000 company websites, to offer a data set large enough to develop a new way of looking at innovation and productivity. Following the cross-industry standard for data mining and guided by industry classifications, the team focused on three groups: high-tech, low-tech and services. They had hypothesised that they would get good data from high-tech companies likely to have strong web presences – so already ensuring a large sample size. Whereas they expected data for low-tech companies to be scarcer. Finally for services, the team wanted to develop novel ways to identify service innovation and so for example looked at job advertisement data and at the skills required in the labour market. “I was surprised by the sheer volume and breadth of data that we were able to collate. As expected, we retrieved good data for the high-tech sample, but not for the low-tech. To compensate, we expanded our analysis to include medium-tech companies,” explains Suominen. The team also created instructive data points on innovation products, collaborations and company activities, opening up the possibility of mapping wider innovation networks. “We have already identified company linkages to research organisations in a totally new way,” adds Suominen.

Towards policies for sustainable productivity

BIGPROD was guided by the Sustainable Development Goals (SDGs) as a framework for better understanding of the socio-economic impact. The team are currently analysing how company SDG-related mission and vision statements can indicate how the sustainability transition can become embedded in company targets, an aspect that is instructive for innovation policies. “For productivity investments to have a positive socio-economic impact, policymakers need to know which policy levers to pull. BIGPROD is working to determine how countries can create the right conditions for innovation,” concludes Suominen. After a series of stakeholder meetings involving policymakers, statisticians, economists and data analysts, the team will soon start the modelling phase to extract further productivity insights from their data. They plan to use open-source Python programmes to allow others to benefit from their results and methodology.


BIGPROD, productivity paradox, web scraping, company, Crépon-Duguet-Mairesse, innovation, investment, high-tech, low-tech, SDGs, economic

Discover other articles in the same domain of application