INODE - Intelligent Open Data Exploration

Periodic Reporting for period 1 - INODE (INODE - Intelligent Open Data Exploration)

Reporting period: 2019-11-01 to 2021-04-30

Open data repositories are made public and can benefit more types of users in different application fields. As a result, the benefit of data exploration becomes increasingly more prominent. However, existing data exploration tools are cumbersome and non-intuitive and make it difficult for most users to access data in an easy way. The core of INODE is to contribute to open data democratization by developing intuitive data exploration tools that enable meaningful exploration and linking of data sets. To this end, the INODE project has already enabled a set of services for: (a) linking and leveraging multiple datasets, (b) searching data using natural language, using examples and using analytics (c) getting guidance from the system in understanding the data and formulating the right queries, and (d) exploring data and discovering new insights through visualizations. These services are leveraging three specific use cases in astrophysics, research & innovation policy making, and biology.
In the first period of the project, the main results of the INODE project are the first deployment of a functional platform and toolset that enable the full range of services and functionalities for open data exploration, namely: data access and exploration, user assistance, visualization, data linking. A user of the INODE platform can search using natural language, get recommendations from the system, use different operators to zoom in on the desired data, get explanations about system answers, and explore through visual means.

Linking new data and making them easily queryable is the first critical requirement in a data exploration scenario. INODE provides a knowledge graph which constitutes a high-level conceptual view of the data. By querying the graph, the user can access the information stored in the data sources by means of a more convenient vocabulary, does not need to be aware of storage details, and can obtain richer answers thanks to the domain knowledge. This is especially critical for our biology use case, since biological data are very complex and distributed over different sources. INODE supports rich queries (e.g with aggregate functions) in full compliance with the SPARQL 1.1 standard. Furthermore, to leverage unstructured data, INODE focused on triple extraction from unstructured text, and database enrichment via entity linking of the extracted triples with ontology concepts, aiming at enriching the content of the OncoMX (bio) database.

INODE enables data access through two powerful paradigms: search by Natural Language and Explore using operators. For the former, different tools empower the INODE platform that use different and complementary technologies: rule-based and deep learning-based technologies. For the latter, powerful operators allow the user to manipulate the results. For example, a By-neighbors operator searches the neighborhood of a set of items and returns close sets, a powerful operation for finding for example galaxies for our astrophysicists. To facilitate the user to understand the result, at each step, INODE provides Natural Language explanations for easy result interpretation. Furthermore, recommendations provide different options for data exploration. Visual exploration improves data understanding by increasing information density and providing a better overview over multiple search results.
The main novelty of INODE is that it is the first end-to-end data exploration system that brings together novel but up to now mainly disjoint technologies and integrates them to a new ensemble for providing a rich toolset for users to leverage open data. Although some of these solutions and research challenges have been tackled previously, they have not been combined into such an end-to-end intelligent data exploration system, which in turn opens up new research challenges.

By leveraging the full potential of their combined outcome, INODE provides for the first time a complete and powerful end-to-end data exploration solution. This solution equips a diverse set of stakeholders (in astrophysics, biology, and policy making) with intuitive tools for serving their data access requirements, including ontology-based data linking and access as well as effective ways to access data (especially when non-technical users are involved). To do so, INODE has accomplished several advances in these technologies at the intersection of the tools and corresponding technologies.

It is important to mention that the concepts of INODE can be applied not only to empower EOSC-hub but to any data portal. Hence, there is a big opportunity that not only science data is explored more widely, but also the vast amounts of open data that are currently provided by any data portal such as the EU data portal.
INODE Architecture