Learning with Multiple Representations

Informacje na temat projektu

LEMUR

Identyfikator umowy o grant: 101073307

DOI

10.3030/101073307

Data podpisania przez KE 6 Lipca 2022

Data rozpoczęcia 1 Stycznia 2023

Data zakończenia 31 Grudnia 2026

Finansowanie w ramach

Marie Skłodowska-Curie Actions (MSCA)

Koszt całkowity

Brak danych

Wkład UE

€ 2 592 784,80

Koordynowany przez

UNIVERSITAET PADERBORN
Germany

Periodic Reporting for period 1 - LEMUR (Learning with Multiple Representations)

Okres sprawozdawczy: 2023-01-01 do 2024-12-31

Machine learning methods operate on formal representations of the data at hand and the models or patterns induced from the data. They also assume a suitable formalization of the learning task itself (e.g. as a classification problem), including a specification of the objective in terms of a suitable performance metric, and sometimes other criteria the induced model is supposed to meet. Different representations or problem formalizations may be more or less appropriate to address a particular task and to deal with the type of training information available. The goal of LEMUR is to develop the theoretical and algorithmic foundations for a new paradigm in machine learning, which we call learning with multiple representations (LMR). Moreover, corresponding applications are to demonstrate the usefulness of the new family of approaches. Consequently, the doctoral network is structured along three facets: theory, algorithms, and applications. The objective of the work under the theory facet is to develop the first set of formal guarantees and limitations of LMR. Here, the project focuses in particular on questions pertaining to performance prediction, computational complexity and uncertainty. The second facet of the project, algorithms, is addressed by developing unsupervised machine learning approaches for visualizing multi-modal data (e.g. knowledge graphs in graph and embedding representations), supervised explainable machine learning approaches for structured data and neuro-symbolic machine learning approaches. The last facet consist of works in information retrieval, critical infrastructure management and ethical machine learning.

We focused on foundational work necessary to implement LMR. In the theory facet, the doctoral candidates took on problems such as jointly learning choice utilities and rationality, multiple criteria decision aiding, neuro-symbolic machine learning, and large language models. In the application facet, the focus was on the implementation of concrete algorithms which rely on LMR, including unsupervised learning approaches for structured data, novel embedding approaches for knowledge graphs, and neuro-symbolic learning. A key study tackled in this facet is the use of large language models for autoformalization with the aim of creating bridges between representations automatically. The applications we study in the third facet of the project pertain to critical infrastructures, content retrieval, recommmendations and ethics.
After a 6-month warm-up phase, the project welcomed its doctoral candidates and was promptly afloat. The timeliness of the topic targeted by the project has led to 7 accepted papers, partly at major venues including ECAI, ECML, CIKM, and ESANN. The main results achieved include the first validity guarantees in bridging between natural language and structured queries. Algorithmically, we were able to outperform the state of the art in multiple decision criteria aiding over real and synthetic data. Moreover, our generalization of embedding algorithms into degenerate Clifford algebras provably shows that embedding algorithms thought to be different are just different facets of the same coin. New parameterized grounding approaches increase the flexibility of interfacing between neural and symbolic representations. Finally, our new fairness measure ensure that our approaches can exploit contextual norms to mitigate some of the current limitations of machine learning systems.

Like the project, the results beyond the state of the art can be subdivided into three groups: theory, algorithms, and applications. The bulk of our current innovations pertains to theory. We formulated first inference guarantees in scenarios with uncertain data generating models. We also devised the first strict generalization of multiplicative embeddings and showed how it can induce new algorithms never explored before. Importantly, these algorithms were shown to outperform the state of of the art on several datasets. Finally, we also provide the first neuro-symbolic approach that is guaranteed to provide syntactically valid SQL queries to bridge between text and databases.
In the area of algorithms, we were able to outperform the state of the art in multiple criteria decision aiding. Moreover, we provide the first means for the correct annotation and segmentation of meshes. This mesh is used in 3D for trustworthy downstream applications. Moreover, we introduced the first algorithm that can simultaneously learn choice utilities and individual rationalities. Ongoing work include the extension of mixture of experts and the use of transformers for several modalities.
The domains of applications have remained unchanged. Critical infrastructures are targeted using deep learning, search and recommendation is addressed by mixtures of experts, and algorithmic fairness underpins our ongoing works.

Project logo

Periodic Reporting for period 1 - LEMUR (Learning with Multiple Representations)

Pobierz Pobierz zawartość strony