Interactive Machine Learning for Compositional Models of Natural Language

Informations projet

INTERACT

N° de convention de subvention: 853459

Site Web du projet

DOI

10.3030/853459

Date de signature de la CE 28 Novembre 2019

Date de début 1 Mars 2020

Date de fin 28 Février 2026

Financé au titre de

EXCELLENT SCIENCE - European Research Council (ERC)

Coût total

€ 1 499 375,00

Contribution de l’UE

€ 1 499 375,00

1 499 375,00

Coordonné par

UNIVERSITAT POLITECNICA DE CATALUNYA
Spain

Periodic Reporting for period 3 - INTERACT (Interactive Machine Learning for Compositional Models of Natural Language)

Période du rapport: 2023-03-01 au 2024-08-31

The goal of INTERACT is to develop interactive learning techniques so that we can train AI systems with minimal human supervision. The main focus is on applications in natural language processing such as automatic text classification. Currently, one of the main bottlenecks in training AI systems is the large amount of human annotated data required. Progress in achieving the goals of INTERACT will have an impact in the reach of AI technology. By reducing human annotation costs we wish to bring the power of customized AI systems to a larger public.

We have done work in three main directions: 1) We have developed active learning algorithms for training machine learning models under annotation budget constraints and shown that they can significantly reduce the amount of annotated data necessary to achieve a given performance in some sequence classification benchmarks. 2) We have developed a novel model for semantic parsing specifically designed to improve generalization to unseen patterns that can be made by combining elements observed at training. We are currently working in developing active sampling strategies for this model. 3) We have studied the properties of word embeddings that best explain their performance in the few-shot learning scenario. We showed that the structural alignment between the space induced by a representation and the target class labels is critical. We are currently working on exploiting these insights to design efficient active sampling strategies for text classification. With respect to tools we have developed a software library for training linear deep-learning models and we have also started the development of a library for general low-rank matrix completion that will reduce several machine learning problems to matrix completion.

Currently, most machine learning systems require large amounts of supervised training data in order to achieve reasonable performance. This is even more evident for complex tasks in natural language processing such as relation extraction or semantic parsing. This is because language is rich and complex and a good strategy for selecting the most informative pieces of data to annotate is needed. In INTERACT we expect to make a significant advancement on clever sampling strategies for learning complex functions in the context of natural language tasks, with these strategies we hope to achieve a drastic reduction in the amount of annotated data required to train a model.

interact.jpeg

Periodic Reporting for period 3 - INTERACT (Interactive Machine Learning for Compositional Models of Natural Language)

Partager cette page Partager cette page sur les réseaux sociaux

Télécharger Télécharger le contenu de la page