Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS
Content archived on 2024-05-07

Language and image data fusion using stochastic models and spatial context modelling

Objective

The project is aimed at developing an innovative processing system, involving stochastic models, spatial context modelling, and linguistics modelling, in order to achieve deep co-operation between gestural and verbal modalities in automatic interpretation tasks. The system should allow both streams to be efficiently merged with semantic and structural knowledge.

Input to the system will typically consist of two visual scenes, along with a related verbal input. The first visual scene will be a view of the user, allowing us to track his gestures. The second one will be an image of the scene to be described. The utterance will be sentences which designate or describe objects in the current scene. The system will have to interpret the user's gestures and speech messages conveying some information of the scene to be described. It must be able to update its internal representation of the scene, by retrieving which part of the scene is described.

To evaluate the system, we will define a multimodal corpus, based on specified scenarios. This evaluation consists in a measure of the capability of the system to interpret the multimodal input. For the first phase, we will use a pointing device in order to mark the pointed objects in each utterance composing the corpus. This will allow us to compare the marked objects with the objects found by the system. The measure will be the percentage of objects correctly identified.

% One of the main challenge of the Chameleon project is the integration of several domains going from Signal Processing to Artificial Intelligence, through Speech Recognition, Natural Language Processing, Image Processing, and Gesture Recognition. It requires the study and the realisation of a set of techniques and tools dealing with multimodal inputs and knowledge representation. It addresses processes of vision, linguistics, heterogeneous data fusion, and different modalities combination. In particular, it aims at developing:

- New systems of vision-based gesture recognition using stochastic models ;
- Novel computational model based on the modelling of the linguistic and spatial context, to provide a unified representation for the different types of information ;
- Realisation of a multimodal corpus to evaluate the system as a whole.

Besides, these developments can be beneficial to the enhancement of human-computer communication, pattern recognition methods, knowledge modelling approaches, and open the way to promising solutions for industrial applications on natural human-machine interfaces.

Programme(s)

Multi-annual funding programmes that define the EU’s priorities for research and innovation.

Topic(s)

Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.

Call for proposal

Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.

Data not available

Funding Scheme

Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.

ACM - Preparatory, accompanying and support measures

Coordinator

Bertin & Cie Sa
EU contribution
No data
Address
Rue Pierre Curie 59
78370 Plaisir
France

See on map

Total cost

The total costs incurred by this organisation to participate in the project, including direct and indirect costs. This amount is a subset of the overall project budget.

No data
My booklet 0 0