CORDIS - Resultados de investigaciones de la UE
CORDIS

Knowledge Graphs at Scale

Periodic Reporting for period 1 - KnowGraphs (Knowledge Graphs at Scale)

Período documentado: 2019-10-01 hasta 2021-09-30

Knowledge graphs (KGs) are widely regarded as a key enabler for explainable machine learning with over 4B distinct users through Google alone. They are also used by a number of Fortune500 companies to provide key user-facing and backend functionality (e.g. chatbots, product descriptions, recommendations, etc.). However, deploying and using KGs at the core of small and medium-sized businesses or even for personal purpose is still challenging for most of the entities. The goal of KnowGraphs is to address some of the key challenges related to the representation, extraction, operation and exploitation of KGs. To this end, the ITN develops time-efficient and effective representation, extraction, storage, verification and exploitation algorithms for KGs that can be easily employed by large and small companies as well as individuals. The legal implications of these developments as well as real ways to exploit these solutions are also considered. The societal ramifications of the results of this ITN are directly linked with current developments at the interface between data, algorithms and humans both at EU and worldwide level. By making KGs easier to use in practice, the project supports the democratization and broadening of their use. Furthermore, by studying the legal consequences of the use of KGs in real-life applications, KnowGraphs supports the AI and Data Protection agendas of the EU, especially w.r.t. explainability, consent and explicit information pertaining to the use of AI.
ESR 1 explores the representation of syntactic formalisms in the multidisciplinary field of e-learning education, career advancement, labor market, user profiling and personalized recommendation systems. Her core result is the development of the EduCOR ontology, an educational, career-oriented recommendation ontology that provides a foundation for representing online learning resources for personalized learning systems.
ESR 2 focuses on representation techniques for KGs that allow to query them efficiently. The technical challenge he faced is the design and use of a universal data model able to cater for the different instantiations of the KG paradigm. He was able to show that the 5D tensor commonly assumed for property graphs can be flattened to a 3D tensor with a significant improvement in the runtime of GraphQL queries.
ESR 3 focuses on the formal representation of constraints for reasoning and querying over (distributed) KGs. Her work aims at developing formal semantics for policy profiles using the ODRL Regulatory Compliance Profile.
ESR 4 works on Multilingual approaches to exploit and expand information in KGs and use them in downstream NLP tasks, such as Entity Linking and Relation Extraction. He has designed and published an autoregressive approach, based on BART model, to tackle the Relation Extraction task, obtaining remarkable results.
ESR 5 developed a new method for hyperparameter optimization for KG embeddings with factorial designs and developed a semi-Riemannian graph neural network for graph representation learning. He conducted research on box embeddings focussing on ontology embeddings in the EL++ description logic.
ESR 6’s research aims at the novel definition, parsing and exploitation of sentence-level KGs. He worked to define an enhanced version of AMR to obtain a truly semantic representation of sentences. The BMR work resulted in a “Blue Sky” paper, accepted to 2022 AAAI conference.
ESR 7 targets the development of methods able to exploit evidence within an input knowledge graph as well as evidence external to the KG (especially large text corpora) to compute the probability that a particular fact is true while circumventing feature engineering. His preliminary implementation already achieves results close to the state of the art.
ESR 8 focuses on the evolution/change of KGs within the scheme of graph embedding methods and FAIR. She is implementing automatic rewriting techniques for SPARQL queries that consider changes as well as entity alignment with graph embeddings on evolving KGs to capture the changes between versions of knowledge graphs.
ESR 9 carries out research on Data Provenance Models for Relational and Linked Data. Her research goals include the creation of a provenance-aware main memory database for the aforementioned provenance models and the redesign of the query evaluation algorithm of an existing RDF database that takes into account provenance information.
ESR 10 works on discrete knowledge graph embedding methods and carried out a state-of-the-art survey of binary and non-binary discrete methods. He is looking into using graph neural networks for improving the state-of-the-art in managing and querying polymorphic knowledge graphs.
ESR 11 tackles the problem of ante-hoc explainable machine learning (ML) on large KGs. He developed a new family of explainable ML techniques for KGs based concept synthesis, showing that simple implementations of this paradigm are more time-efficient and as effective as the state of the art.
ESR 12 scrutinizes the necessary features of a dynamic consent model that can reconcile the various rights and interests (individual and societal) related to the processing of personal health data for biomedical research using KGs. In his work on biomedical research, the author analyses consent withdrawal in processing personal data that is hardly ever discussed in literature.
ESR 13 studies the exploitation of KGs in the economic domain. She is preparing a paper on constructing a knowledge graph of crowdfunding business proposal data that can be used as a baseline for idea generation tasks for new product and service development and enhance human creativity. This core idea of her research was presented at the PhD Symposium of the 13th ACM Web Science Conference 2021.
ESR 14's work focuses on defining a methodology to generate metadata for KGs that enable their efficient discovery and reuse for use cases in medicine and finance. Her contributions include an in-depth analysis of methods for the (semi-) automated construction of metadata and an investigation into the tools and techniques used to analyse query logs.
ESR 15’s research pertains to whether data protection principles can be embedded into computer software so that the regulatory aims of data protection are achieved. He has submitted two articles: 1) The challenge of incorporating legal rules into digital applications (accepted) and 2) Data Protection by Design and by Default and the Certification Scheme of the GDPR.
KnowGraphs' 15 ESRs have already been able to extend our understanding of the extraction, operation and exploitation of KGs (see above). For example, 5D tensors flattened to 3D tensors lead to significantly better query runtimes on hyperrelational graphs. Semi-Riemannian spaces are well suited to embedding KGs. Universal models for semantics seem to exist and clearly outperform some of the tooling used in the industry. Concept synthesis is a viable alternative to concept search. Bringing correct extraction, efficient knowledge representation, fast querying and constraints together with ante-hoc explainable ML based on concept synthesis means creating the foundation for the scalable use of KGs in data-driven environments within which explainability is key. The implications of our results for the application of KGs are vast, including all areas where explainability is required (e.g. Industry 4.0 medicine, finances, etc.).
Project Logo