Skip to main content
Przejdź do strony domowej Komisji Europejskiej (odnośnik otworzy się w nowym oknie)
polski polski
CORDIS - Wyniki badań wspieranych przez UE
CORDIS

Modern Challenges in Learning Theory

Periodic Reporting for period 1 - GENERALIZATION (Modern Challenges in Learning Theory)

Okres sprawozdawczy: 2022-09-01 do 2025-02-28

In recent years, Machine Learning (ML) has made incredible strides. We now see learning algorithms applied across a wide range of fields, from engineering feats like self-driving cars to social applications that involve private data.

However, these advancements come with significant challenges:

1. Many recent breakthroughs reveal unexpected behaviors that are not well understood and sometimes contradict established knowledge. A major reason for this is that traditional ML theory takes a worst-case approach, which can be overly pessimistic. In reality, data is rarely the worst case, and experiments often show that much less data is needed than traditional theory predicts.

2. As ML applications increasingly handle private and sensitive data, it’s crucial to develop algorithms that protect this information responsibly. Although the field of Differential Privacy (DP) addresses this need, we still don’t fully understand the cost of privacy: How much more data is needed when privacy is a requirement compared to when it’s not?

Motivated by these challenges, our key question is:

**How much data is needed for learning?**

To answer this, we aim to develop a new theory of generalization that better reflects real-world learning tasks, complementing traditional approaches. We plan to build this theory around perspectives that depend on the data, the distribution, and the algorithm itself, rather than focusing solely on the worst-case scenarios of classical theory. This approach allows us to take advantage of the specific characteristics of each learning task.

We will use this new framework to explore different learning scenarios, including supervised, semi-supervised, interactive, and private learning. We believe that this work will improve the efficiency, reliability, and real-world relevance of ML. Additionally, since our research draws on ideas from various areas within computer science and mathematics, we expect it to have broader impacts beyond our field.
At the halfway point of our project, we have made significant strides in advancing the understanding of learning algorithms, particularly in the context of generalization, privacy, and other aspects of responsible machine learning. Our work centers on bridging the gap between theoretical insights and practical challenges, and we’ve already seen impactful results.

One of our major achievements refines the classical theory of learning by developing a new framework that models and measures the performance of learning algorithms from a data-dependent and distribution-dependent perspective. Traditionally, learning performance has been evaluated through worst-case scenarios, which often don’t reflect practical applications. Our new theory, however, focuses on the learning curve—a graph that shows how an algorithm’s performance improves as it receives more data. In practice, the learning curve is a fundamental tool used to assess and optimize the effectiveness of learning algorithms. By integrating data and distribution characteristics into this framework, our theory offers a more realistic and accurate way to evaluate and predict the performance of algorithms in various contexts.

Another major achievement is a comprehensive exploration of the connection between privacy and learning. We demonstrated that learning while protecting sensitive information according to the formal requirements of differential privacy is tightly linked to online learning—a classical and well-studied setting in the field.

Furthermore, we explored the stability and replicability of learning algorithms. By leveraging concepts from topology, a branch of mathematics focused on shapes and spaces, we created new ways to ensure that learning algorithms are not only effective but also stable and replicable. This work contributes to the ongoing efforts to develop responsible and dependable AI systems.My project has achieved several results that go beyond the current state of the art, resolving well-studied open problems and opening new research directions. These breakthroughs provide a deeper understanding of fundamental aspects of learning theory, including privacy-aware learning, multiclass classification, and the relationship between data compression and generalization. By addressing these long-standing challenges, we have not only advanced theoretical knowledge but also laid the groundwork for practical applications in responsible and reliable AI systems.
Our project has achieved several results that go beyond the current state of the art, resolving well-studied open problems and opening new research directions. These breakthroughs provide a deeper understanding of fundamental aspects of learning theory, including privacy-aware learning, multiclass classification, and the relationship between data compression and generalization. By addressing these long-standing challenges, we have not only advanced theoretical knowledge but also laid the groundwork for responsible and reliable AI.

To ensure further uptake and success of these results, the key need is to continue supporting and expanding my team. The incredible progress we have made is a testament to the hard work and dedication of my team members, whose expertise and collaboration have been essential. I could not possibly have achieved these outcomes without their contributions, and their continued involvement is vital for building on these achievements and driving future innovations.
Moja broszura 0 0