Skip to main content

Big DATA approaches FOR improved monitoring of research and innovation performance and assessment of the societal IMPACT in the Health, Demographic Change and Wellbeing Societal Challenge

Deliverables

New indicators for impact and policy making assessment interim report

We plan to develop a set of new indicators and evidence gathered to assess research and societal impact. The indicators will be developed through a process involving cross analysis, parallel research activities and continuous feedback loops between WP2 (the methodological concept), WP3 (data collection activities) and WPs 4-5 (the analysis phase). The indicators developed in this task will be in two workshops described in WPs4-5, following which they will be transferred to WP6 for community-lead validation and development of new tools/front-end solutions.

Proposed Topic Modelling workflow report

We will prepare a report that reviews our proposed Topic Modelling workflow

Final use case evaluation report

Specific peer review, dissemination, and impact requirements as well as settings by the selected research communities will be addressed and reflected in the evaluation report.

Final platform operation report

The deliverable reports on the data sources, mining algorithms and numbers relative to the operation of the platform

Intermediate report on the conceptual framework

Based on the literature survey a list of potential indicators will be developed. Next to theoretical/conceptual arguments for the indicators, an assessment of the availability and international comparability of these indicators will be made under this task. In addition, also benchmarks, indexes and relative measures will be developed.

Use Cases and Pilots: Definition of Methodology

The goal of this deliverable is to carry out a series of pilot studies. The Data4Impact team will lay out the specific plans, engage with the corresponding communities, identify the most fitting methodologies and settings, and will synchronise with the activities in WP3-WP5.

Final report on the conceptual framework & proposed indicators

The final report on the conceptual framework & proposed indicators will describe how and which objectives were achieved, which difficulties complicated the project and which further perspectives arise from its final state.

Final Dissemination, Communication and Exploitation plan

All partners have a vested interest in exploiting the results of Data4Impact, in particular with respect to the implementation WP6. This deliverable will develop an exploitation plan to ensure the broad outreach of the project and promote its activities and findings among target groups identified. To do so, it will first identify and categorise all stakeholder groups. The exploitation plan will specifically deal with the long-term sustainability and usability of the Data4Impact platform, developed indicators, tools and software.

Methodology related to the pilot analysis of company, EU Projects, policy documents, policy guideline and social media/media data report

This task will focus on two main activities: (a) spotting and investigating EU-funded companies/SMEs and key topics and themes related to the Health, Demographic Change and Wellbeing Societal Challenge, and, (b) constructing a company-topic graph (multimodal networks) where large components, trends and strong community ties can be isolated in topic-specific health-related sectors and traced in time and space. A wealth of network measures will be estimated facilitating the development of innovative impact indicators. The focus of this analysis will be on the companies/SMEs participating in EU research programmes (FP7, H2020), but we will also aim to expand the coverage to national level. To this end, existing text analytics tools and workflows for processing the semi-structured and unstructured textual company data harvested and available in the Data4Impact Repository will be adapted, tuned, deployed and integrated in the Data4Impact platform. We plan to exploit and examine the effectiveness of recent advances in deep learning and distributional semantic representations concerning Entity (company) extraction and linking and Topic/Theme extraction, together with all necessary pre-processing tools like text normalisers, morphological analysers, term extractors and syntactic parsers. In task 5.2 our focus is on harvested data of finalised EU research projects and their stated expected impacts in the project final reports. The related data will be derived from the Cordis system. Similar to the aforementioned Task 5.1, we will deploy content analytics workflows (entity extractors and topic/theme analysers) to cater for the detection of entities (e.g. organizations: Universities, Companies, Research organizations, SMEs, Labs, Hospitals, Health organizations etc. and EU Projects) mentioned in specific topics and themes related to the Health, Demographic Change and Wellbeing Societal Challenge. We will also build entity-topic networks examining links and correlations of entity mentions to significant topics and themes. Analysis will provide a set of measures contributing to the development of new impact indicators in Task 2.4 In task 5.3 our focus is on harvested data of policy documents & research activities/trends in selected countries (UK, Germany, Sweden). Similar to the aforementioned Tasks 5.1 & 5.2, we will deploy content analytics workflows (entity extractors and topic/theme analysers) to cater for the detection of entities (e.g. organizations: Universities, Companies, Research organizations, SMEs, Labs, Hospitals, Health organizations etc. and EU Projects) mentioned in specific topics and themes related to the Health, Demographic Change and Wellbeing Societal Challenge. We will also build entity-topic networks examining links and correlations of entity mentions to significant topics and themes. Analysis will provide a set of measures contributing to the development of new impact indicators in Task 2.4. Additionally, in Task 5.3, the bibliometric analysis of research impact in the professional sphere will be done based on references in clinical guidelines indexed in Clinical Impact, (CI:), a citation database that indexes clinical guideline references to the research literature. We will use this infrastructure to populate the database to provide the material in the form of collected documents with reference lists, from the respective use cases (UK, Germany, Sweden). The data will be semi-automatically translated into citations and validated against PubMed and WoS as well as the OpenAire system. Thereafter, biographical data on researchers Institutional, Country and research areas will be used to develop impact measures and map how EU financed research has come to use in the actual professional health sector setting. In Task 5.4, for social media data, we will follow the methodology developed in Nelhans and Gunnarsson Lorentzen (2016) to collect Twitter conversations about research. We will use the Twitter streaming API to

Policy report on new indicators and approaches for assessing the societal impact of research and innovation activities

A policy report will be prepared. It will provide an overview of new indicators and approaches for assessing the societal impact of research and innovation activities

Analysis of publication and patent data related to the Health, Demographic Change and Wellbeing Societal Challenge report

This Task will analyse patent data, meta-data and links related to the Health, Demographic Change and Wellbeing Societal Challenge based on the Topic Modelling related analysis workflow, similar to the above mentioned T4.2. We will also try to incorporate and compare findings from related multidimensional analysis on publication data as well as company, project report and social data that will be performed in WP5.

Interim use case evaluation report

Interim results of specific peer review, dissemination, and impact requirements as well as settings by the selected research communities will be addressed and reflected in the report.

Project handbook

The project handbook includes quality assurance & risk management plan, also, a monitoring and evaluation plan. The latter will elaborate on the relevant indicators (including, but not limited to progress and impact indicators, qualitative and quantitative targets).

Interim platform operation report

The deliverable reports on the data sources, mining algorithms and numbers relative to the operation of the platform.

Analysis of company, EU Projects, policy documents, clinical guideline and social media/media data report

The deliverable will focus on two main activities: (a) spotting and investigating EU-funded companies/SMEs and key topics and themes related to the Health, Demographic Change and Wellbeing Societal Challenge, and, (b) constructing a company-topic graph (multimodal networks) where large components, trends and strong community ties can be isolated in topic-specific health-related sectors and traced in time and space. A wealth of network measures will be estimated facilitating the development of innovative impact indicators. The focus of this analysis will be on the companies/SMEs participating in EU research programmes (FP7, H2020), but we will also aim to expand the coverage to national level. To this end, existing text analytics tools and workflows for processing the semi-structured and unstructured textual company data harvested and available in the Data4Impact Repository will be adapted, tuned, deployed and integrated in the Data4Impact platform. We plan to exploit and examine the effectiveness of recent advances in deep learning and distributional semantic representations concerning Entity (company) extraction and linking and Topic/Theme extraction, together with all necessary pre-processing tools like text normalisers, morphological analysers, term extractors and syntactic parsers. Our focus is on harvested data of finalised EU research projects and their stated expected impacts in the project final reports. The related data will be derived from the Cordis system. We will deploy content analytics workflows (entity extractors and topic/theme analysers) to cater for the detection of entities (e.g. organizations: Universities, Companies, Research organizations, SMEs, Labs, Hospitals, Health organizations etc. and EU Projects) mentioned in specific topics and themes related to the Health, Demographic Change and Wellbeing Societal Challenge. We will also build entity-topic networks examining links and correlations of entity mentions to significant topics and themes. Analysis will provide a set of measures contributing to the development of new impact indicators. Our focus is on harvested data of policy documents & research activities/trends in selected countries (UK, Germany, Sweden). We will deploy content analytics workflows (entity extractors and topic/theme analysers) to cater for the detection of entities (e.g. organizations: Universities, Companies, Research organizations, SMEs, Labs, Hospitals, Health organizations etc. and EU Projects) mentioned in specific topics and themes related to the Health, Demographic Change and Wellbeing Societal Challenge. We will also build entity-topic networks examining links and correlations of entity mentions to significant topics and themes. Analysis will provide a set of measures contributing to the development of new impact indicators. Additionally, the bibliometric analysis of research impact in the professional sphere will be done based on references in clinical guidelines indexed in Clinical Impact, (CI:), a citation database that indexes clinical guideline references to the research literature. We will use this infrastructure to populate the database to provide the material in the form of collected documents with reference lists, from the respective use cases (UK, Germany, Sweden). The data will be semi-automatically translated into citations and validated against PubMed and WoS as well as the OpenAire system. Thereafter, biographical data on researchers Institutional, Country and research areas will be used to develop impact measures and map how EU financed research has come to use in the actual professional health sector setting. For social media data, we will follow the methodology developed in Nelhans and Gunnarsson Lorentzen (2016) to collect Twitter conversations about research. We will use the Twitter streaming API to filter tweets containing the strings 'dx' and 'doi' or including an embedded dx.doi.org URL. In order to capture the follow-on tweets or follow-on commun

Online prototype report + feasibility study

The goal of this Task is to integrate and combine models, tools and indicators proposed in WP4 and WP5 building an end-to-end on-line analysis prototype able to provide, analyse and visualize live monitoring data as well as related impact assessment findings. Such tool will work on top of the Data4Impact Repository and will: • automate (to the extent possible) of the processes, techniques and methodologies that will be proposed in WP4 & WP5, • handle on-line incremental models updating which would work with new/live incoming data • integrate and combine models, techniques, indicators and findings from WP4 and WP5 providing a multi-dimensional, multi-perspective research impact assessment and analysis Such prototype will be tested and evaluated in real world use cases such those proposed in WP6.1. Finally, we will perform a feasibility study/SWOT analysis on how to expand/scale up our activities to more areas/countries/funders.

Platform high-level architecture and data flows

The Data4Impact platform will be built using the D-NET Software Toolkit that is a data infrastructure “enabling software” currently used to operate known production systems (e.g. the OpenAIRE infrastructure, the EFG infrastructure, and several national repository aggregator infrastructures). D-NET will be configured, extended, and deployed to deliver the Data4Impact platform back-end, with the aim of supporting tools for scholarly communication objects aggregation and mining of publication, patent, company, social media/media, etc. data. The delivery of an operational platform, with the required computation and storage needs, will be based on an initial set of non-functional requirements to be defined in synergy with WPs 2, 4, 5 and 6.

Draft Dissemination, Communication and Exploitation plan

Data4Impact will create a suite of communication products to ensure that the messaging and external communications are consistent. This communications toolkit will be made available to all partners and they will be trained in the use of it. Additionally, each phase of the project activities will be supported by ad hoc communication services, i.e. press releases will be drafted, translated into national languages (if necessary) and circulated to a pertinent audience to promote the Data4Impact activities at the early stage of the project. Publication of scientific papers–It will be a major aim to channel the project results to scientific papers, and thereby contribute to an evidence-based understanding of the areas covered by the project. Proactive participation in third parties events and exhibitions–Liaise with ongoing or emerging initiatives for the co-organization or participation in Data4Impact planned or related events. Liaison with related initiatives - Partners have already identified additional initiatives for potential collaboration (e.g. using OpenAIRE’s broad outreach to 35+ national settings, and other projects ). We will actively pursue other potential partnerships and collaborations during the course of the project. Also, the dissemination plan will include project's communication strategy an extensive list of dissemination indicators and the specific targets ans well as foresees yearly policy briefs and policy round tables (which can become a part of the workshops foreseen in this project).

Searching for OpenAIRE data...

Publications

MESH classification of clinical guidelinesusing conceptual embeddings of references




Developing a rule-based method for identifying researchers on Twitter: The case of vaccine discussions

Author(s): Ekström, B.
Published in: 2019

Hierarchical bi-directional attention-based RNNs for supporting document classification on protein–protein interactions affected by genetic mutations

Author(s): Aris Fergadis, Christos Baziotis, Dimitris Pappas, Haris Papageorgiou, Alexandros Potamianos
Published in: Database, Issue 2018, 2018, ISSN 1758-0463
DOI: 10.1093/database/bay076

The Opportunities and Limitations of Using Artificial Neural Networks in Social Science Research

Author(s): Lukas Pukelis, Vilius Stančiauskas
Published in: Politologija, Issue 94/2, 2019, Page(s) 56-80, ISSN 1392-1681
DOI: 10.15388/polit.2019.94.2