Community Research and Development Information Service - CORDIS

H2020

ERC-EuropePMC-2-2014 Report Summary

Project reference: 637529
Funded under: H2020-EU.1.1.

Periodic Reporting for period 1 - ERC-EuropePMC-2-2014 (Extracting funding statements from full text research articles in the life sciences)

Summary of the context and overall objectives of the project

Funding agencies need to be able to identify the outcomes of research in order to assess the impact of funding across different themes and, in the case of the European Research Council (ERC), also among different disciplines and geographical areas. In the life sciences, research articles are the core currency of research assessment, therefore identifying articles that have been supported by a given funding agency, and through a particular grant or funding stream, is vitally important. Currently the only way to identify papers in Europe PMC that have been funded by the ERC are:
(1) through metadata associated with the article, usually because the PI has used the grant-linking tool in Europe PMC or a similar tool, post-publication of the article
(2) through metadata received from OpenAIRE, based on FP7-based funding.
Relying on these methods alone grossly underestimate the number of articles that can be attributed to the ERC.
(3) through free text search of full text, which can be error prone due to noise from the full text content of the article, and this approach also does not scale to look for specific grant IDs.

The objectives of the project can be described as follows:

Strategic objectives of the project:

(1) To support the ERC’s visibility as a funder of excellent research by enabling the showcasing of ERC supported research results. Once funding statements are identified they will be displayed prominently as distinct metadata on Europe PubMed Central abstracts and incorporated into searchable fields and programmatic web services.
(2) To support the visibility of ERC grantees (the fact that they have obtained an ERC grant, which is a label of excellence).
(3) To facilitate the analysis of the impact of ERC funding. Unstructured and hard-to-find funding information in full text articles in Europe PubMed Central will be surfaced and made available for straightforward searching and filtering.

Operational objectives of the project (supporting the achievement of the strategic objectives):

(1) To accurately identify statements in full text articles that attribute ERC funding schemes and grants.
(2) To integrate ERC-based funding statements into the Europe PMC interfaces and search.

Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far

In order to improve the recall of articles attributed to the ERC within the Europe PMC database, we have developed a new text-mining module for the Europe PMC text mining pipeline. This module accurately identifies statements in full text articles that attribute both ERC funding schemes and grants. In order to achieve this objective, we report the following tasks completed:
(1) Development of the algorithm that identifies ERC funding statements including specific grant IDs. This algorithm, given sentences from Acknowledgements-type sections of articles, first identifies funding IDs using pattern matching, and then validates those IDs based on contextual information (such as the occurrence of the phrase “European Research Council”) within each sentence.
(2) Analysis of the scope and quality of outputs, iteration to improve the algorithm and allow the potential extension to other Europe PMC funders.
(3) Integration of the European Research Council Funding Statements Extraction Algorithm developed into the full-scale Europe PMC public services so that the algorithm operates daily on all new full text content entering Europe PMC and the outcomes are available in the public interfaces via simple search terms.

Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far)

(1) Prior to the integration of the developed algorithm, just over 2000 full text articles in Europe PMC could be retrieved using the following query: (This can easily be constructed via the Advanced Search page. At the time of integrating the algorithm as complete, i.e. after the application of the algorithm described in D1.1 to all the full text content in Europe PMC, this number has risen to around 4,724 full text articles (14th September 2015).
(2) We expect the number of articles attributed to the ERC to rise in the future, both through the collection of attributions by the more traditional methods, but now also as a result of text mining the information out of full text articles incoming on a daily basis. This dataset may provide insight into trends regarding how researchers attribute their funding sources.
(3) The source code for the algorithm developed to extract ERC funding statements is available on GitHub (https://github.com/jeekim/EuropePMC-Identifier-Extractor).

Related information

Record Number: 186297 / Last updated on: 2016-07-08