Periodic Reporting for period 1 - ERC-EuropePMC-2-2014 (Extracting funding statements from full text research articles in the life sciences)
Berichtszeitraum: 2014-09-01 bis 2015-08-31
(1) through metadata associated with the article, usually because the PI has used the grant-linking tool in Europe PMC or a similar tool, post-publication of the article
(2) through metadata received from OpenAIRE, based on FP7-based funding.
Relying on these methods alone grossly underestimate the number of articles that can be attributed to the ERC.
(3) through free text search of full text, which can be error prone due to noise from the full text content of the article, and this approach also does not scale to look for specific grant IDs.
The objectives of the project can be described as follows:
Strategic objectives of the project:
(1) To support the ERC’s visibility as a funder of excellent research by enabling the showcasing of ERC supported research results. Once funding statements are identified they will be displayed prominently as distinct metadata on Europe PubMed Central abstracts and incorporated into searchable fields and programmatic web services.
(2) To support the visibility of ERC grantees (the fact that they have obtained an ERC grant, which is a label of excellence).
(3) To facilitate the analysis of the impact of ERC funding. Unstructured and hard-to-find funding information in full text articles in Europe PubMed Central will be surfaced and made available for straightforward searching and filtering.
Operational objectives of the project (supporting the achievement of the strategic objectives):
(1) To accurately identify statements in full text articles that attribute ERC funding schemes and grants.
(2) To integrate ERC-based funding statements into the Europe PMC interfaces and search.
(1) Development of the algorithm that identifies ERC funding statements including specific grant IDs. This algorithm, given sentences from Acknowledgements-type sections of articles, first identifies funding IDs using pattern matching, and then validates those IDs based on contextual information (such as the occurrence of the phrase “European Research Council”) within each sentence.
(2) Analysis of the scope and quality of outputs, iteration to improve the algorithm and allow the potential extension to other Europe PMC funders.
(3) Integration of the European Research Council Funding Statements Extraction Algorithm developed into the full-scale Europe PMC public services so that the algorithm operates daily on all new full text content entering Europe PMC and the outcomes are available in the public interfaces via simple search terms.
(2) We expect the number of articles attributed to the ERC to rise in the future, both through the collection of attributions by the more traditional methods, but now also as a result of text mining the information out of full text articles incoming on a daily basis. This dataset may provide insight into trends regarding how researchers attribute their funding sources.
(3) The source code for the algorithm developed to extract ERC funding statements is available on GitHub (https://github.com/jeekim/EuropePMC-Identifier-Extractor).