Omni-Supervised Learning for Dynamic Scene Understanding

Description du projet

Changer notre façon d’envisager les données et les algorithmes en matière d’apprentissage automatique pour la vision par ordinateur

Les voitures autonomes semblent être à portée de main, et ce en partie grâce au succès des algorithmes de vision par ordinateur développés pour être les «yeux» de ces véhicules. En effet, pour se déplacer, ces derniers doivent comprendre les objets dynamiques de la scène dans laquelle ils s’inscrivent, c’est-à-dire détecter, segmenter et suivre plusieurs éléments en mouvement. La vision par ordinateur permet désormais de surmonter ce problème, principalement grâce aux avancées en matière d’apprentissage profond. La plupart des méthodes s’appuient sur des réseaux neuronaux convolutifs formés de manière supervisée sur des ensembles de données à grande échelle, mais ce paradigme est-il suffisant pour représenter la complexité de nos rues? Le projet DynAI, financé par le CER, ira au-delà de l’apprentissage supervisé. Les chercheurs du projet concevront des modèles innovants d’apprentissage automatique qui exploitent directement des flux vidéo non étiquetés.

Objectif

Computer vision has become a powerful technology, able to bring applications such as autonomous vehicles and social robots closer to reality. In order for autonomous vehicles to safely navigate a scene, they need to understand the dynamic objects around it. In other words, we need computer vision algorithms to perform dynamic scene understanding (DSU), i.e. detection, segmentation, and tracking of multiple moving objects in a scene. This is an essential feature for higher-level tasks such as action recognition or decision making for autonomous vehicles. Much of the success of computer vision models for DSU has been driven by the rise of deep learning, in particular, convolutional neural networks trained on large-scale datasets in a supervised way. But the closed-world created by our datasets is not an accurate representation of the real world. If our methods only work on annotated object classes, what happens if a new object appears in front of an autonomous vehicle? We propose to rethink the deep learning models we use, the way we obtain data annotations, as well as the generalization of our models to previously unseen object classes. To bring all the power of computer vision algorithms for DSU to the open-world, we will focus on three lines of research: 1-Models. We will design novel machine learning models to address the shortcomings of convolutional neural networks. A hierarchical (from pixels to objects) image-dependent representation will allow us to capture spatio-temporal dependencies at all levels of the hierarchy. 2-Data. To train our models, we will create a new large-scale DSU synthetic dataset, and propose novel methods to mitigate the annotation costs for video data. 3-Open-World. To bring DSU to the open-world, we will design methods that learn directly from unlabeled video streams. Our models will be able to detect, segment, retrieve, and track dynamic objects coming from classes never previously observed during the training of our models.

Champ scientifique

Institution d’accueil

NVIDIA ITALY S.R.L.

Contribution nette de l'UE

€ 1 500 000,00

Adresse

VIA GIOIA MELCHIORRE 8
20124 Milano
Italie

Région

Nord-Ovest Lombardia Milano

Type d’activité

Private for-profit entities (excluding Higher or Secondary Education Establishments)

Liens

Contacter l’organisation

Participation aux programmes de R&I de l'UE

Réseau de collaboration HORIZON

Coût total

€ 1 500 000,00

Bénéficiaires (1)

NVIDIA ITALY S.R.L.

Italie

Contribution nette de l'UE

€ 1 500 000,00

Description du projet

Changer notre façon d’envisager les données et les algorithmes en matière d’apprentissage automatique pour la vision par ordinateur

Objectif

Champ scientifique

Programme(s)

Thème(s)

Appel à propositions

Régime de financement

Institution d’accueil

Bénéficiaires (1)

Partager cette page

Télécharger