Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

Retrieval-Augmented VIsion-Language Models for Open-vocabulary LocalizatIon

Objective

The proposed research project, RAVIOLI (Retrieval-Augmented VIsion-Language Models for Open-vocabulary LocalizatIon), aims to significantly advance the field of segmentation by innovatively integrating retrieval-based predictions from a memory with the original predictions of a vision-language model (VLM) through a learnable fusion model. Addressing a critical gap in existing methods, which often struggle to adapt to new or complex classes and domains, RAVIOLI seeks to enhance the accuracy, adaptability, and granularity of segmentation tasks across various applications, from autonomous vehicles to medical imaging. Importantly, there has been no similar attempt to learn a fusion model with these properties in any open-vocabulary dense task, such as segmentation, making our approach truly pioneering. The ambitious scope of this project lies in its aim to create a tailored, flexible, robust, and scalable solution that will redefine the capabilities of vision-language models, setting a new standard in the field of open-vocabulary segmentation. The project will be hosted by the Visual Recognition Group (VRG) at the Czech Technical University in Prague (CTU) under the supervision of Prof. Giorgos Tolias. The fellow, Bill Psomas, with a strong background in computer vision (CV) and deep learning (DL), is well-equipped to lead this research, which will further supported by a secondment at AImageLab, University of Modena and Reggio Emilia (UNIMORE) working with Prof. Rita Cucchiara.

Coordinator

CESKE VYSOKE UCENI TECHNICKE V PRAZE
Net EU contribution
€ 191 918,16
Address
JUGOSLAVSKYCH PARTYZANU 1580/3
160 00 Praha
Czechia

See on map

Region
Česko Praha Hlavní město Praha
Activity type
Higher or Secondary Education Establishments
Links
Total cost
No data

Partners (1)

My booklet 0 0