OpenMinTeD Informe resumido

Project ID: 654021
Financiado con arreglo a: H2020-EU.

Periodic Reporting for period 1 - OpenMinTeD (Open Mining INfrastructure for TExt and Data)

Reporting period: 2015-06-01 hasta 2016-11-30

Summary of the context and overall objectives of the project

Overall View
OpenMinTeD tackles interoperability and builds a platform that facilitates a seamless TDM environment for research. It addresses two types of users:
Type A: Consumers
Domain specific researchers & research communities
Infrastructure Operators

Type B: Producers
Content providers
Service providers

The developed platform is the facilitator for the interoperable text and data mining, which in turn becomes an e-Infrastructural component of the wider research ecosystem and particularly of the European Open Science Cloud (EOSC).

OpenMinTeD has set out to develop a text and data mining (TDM) e-infrastructure by providing concrete services for researchers to collaboratively create, discover, share and reuse knowledge from a wide range of text-based scientific related sources in a seamless way. It primarily works on the following facets:
• Provide a platform where text mining tools and services are registered, discovered, evaluated, combined in a pipeline, and used and shared by researchers or any interested party.
• Provide homogenized access to scientific content (focusing on publications, but also considering other related textual data) which lies in disparate sources behind paid walls or behind technological barriers.
• Trigger innovation by allowing the combination of research data to solve real problems, and SME's to find potential business opportunities.

OpenMinTed provides significant benefits to the economy and the society in the form of increased researcher efficiency, by unlocking hidden and developing new knowledge and improving the research process and its evidence base. These benefits will result in significant cost savings and productivity gains, innovative new service development, new business models, new medical treatments, etc.

Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far

During the first eighteen months we focussed on presenting the concepts of TDM, the legal and technological barriers, and the idea of the OpenMinTeD infrastructure and platform.
One of the main goals is to deliver supporting services to various stakeholders facilitating the adoption of the OMTD infrastructure, especially from the perspective of training and supporting the community of interest. To this the progress of this period summarizes in the following achievements:
• Establishment and deployment of the community training and support services Knowledge Base (KB).
• Development of 3 supportive taxonomies integrated with the KB, defining the scope of the community training and serving to classify training materials. These include the “Text and Data Mining”, “TDM Methods” and “Research Workflow” taxonomies.
• The identification and the upload of seed general TDM training material to the KB.
• The development of a plan/schedule for the delivery of project specific self-learning as well as moderated (e.g. webinar, f2f training) training activities utilising the KB.
• The conducting of a survey of publishers and the subsequent analysis of information on technical issues around machine access to research publications.
• Working on the development of the first draft of the support KB collecting information on machine access to research publications from different publishers.
• Work in progress on the preparation of an initial FAQ style consulting service, to be served out of the KB, addressing common questions regarding legal limitations for TDM.

Community driven requirements and evaluation
• The methodology for collecting requirements relevant to TDM from research communities that have been identified as potential end users of the OpenMinTeD project services was defined.
• Analysis and harmonization of requirements collected from the different research communities by identifying commonalities and differences. Moreover, the most representative stakeholder personas were identified, namely the “Text-mining researcher”, the “Researcher”, the “Data Curator” and the “Technical Manager”. Finally, concrete requirements were collected for both for the OpenMinTeD platform and the foreseen use case applications.
• The functional specifications of the OpenMinTed Platform were defined in order to satisfy the user’s needs, and therefore make the OpenMinTed platform more appealing in various scenarios and applications.

• One of the first activities was the compilation of a report on the landscape of tools and standards in the domain of text and data mining and natural language processing.
• A methodology to create interoperability specification consisting of requirements and how to address them. The first version of the OMTD-SHARE metadata schema was produced by the metadata working group in collaboration with all the other working groups.
• The work on the data interoperability toolkit has focused on defining a modular architecture of connectors to non-standard content provider systems, including ingestion components, harvesting components for each system, converters to the OpenMinTeD schema.

Platform Design and Implementation
• The major layers and components of the platform were identified in the overall platform architecture.
• During the first period of the project the main functionalities of the registry were defined, and the service was designed and implemented.
• The user interface of the platform was designed by taking into account the functional specifications, as well as the results of the OMTD-Share internal data model.

Platform Integration
• The basic software engineering infrastructure required by the project and the platform testing methodology have been established.

Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far)

In terms of greater capacity for knowledge discovery, the project has developed the OpenMinTeD Knowledge Base, which contains training materials on various text and data mining subject topics, such as technical and legal issues. In addition, the three taxonomies relating to text and data mining were created in order to assist the text and data mine stakeholders with content browsing and related content recommendations.
Working towards the reduction of duplication of research it has investigated on the publishers’ open access machine interfaces survey, the results of which are currently used on the publishers’ feasibility report, and will be presented as a technical guide/expertise directory in the OpenMinTeD Knowledge Base. This aims to create a one step guide, useful to a variety of TDM research stakeholders who are investigating perhaps the same issue and experience difficulties in succeeding in it.
IT services are operating with openness in mind. Although ~okeanos is the underlying sole cloud stack in the context of the project focus is given in utilizing open de facto standards such as OpenStack and Amazon EC2. The architecture defined by the project avoids vendor lock in by superimposing integration layers between the cloud infrastructure and end user services. In principle, additional cloud providers will be able to be adopted with minimal or no development effort on the integration layer (e.g. Galaxy support for different infrastructure backends).

