Objective
Audio-visual speech recognition refers to the problem of recognizing speech using both audio and video information. Speech is not a purely auditory process but the way that the listener perceives it is also through the recognition of the visual patterns associated with the mouth movement. This correlation of the audio-visual information has been occasionally explored in literature in order to develop more robust automatic speech recognition systems for cases in which the auditory environment is noisy (e.g. background noise, multiple speakers). However, the problem of audio-visual speech recognition has been mainly studied in controlled, laboratory conditions. TalkingHeads proposes, for the first time, the problem of audio-visual speech recognition in unconstrained (in-the-wild) videos collected from real-world multimedia databases and a set of methodologies that will work well under the assumed in-the-wild setting.
TalkingHeads brings together a talented but experienced researcher (ER) with expertise in speech analysis (diarization and recognition) and the Supervisor with large research experience in Computer Vision for face analysis in-the-wild (recognition, detection, alignment and tracking, and facial expression analysis). TalkingHeads will establish the ER as an independent and internationally recognized researcher in the area of audio-visual fusion and speech recognition. Through TalkingHeads’ achievable work plan, the ER will attain a high level of research maturity by (a) complementing his expertise on speech analysis through extensive training in Computer Vision, (b) conducting research on a challenging research problem (audio-visual speech recognition in-the-wild) with significant career opportunities in both the academia and the industry, (c) publishing at high impact factor conferences and journals, (d) establishing a network of research collaborators, and (e) enhancing personal skills (e.g. supervisory experience, leadership and management skills).
Fields of science (EuroSciVoc)
CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques.
CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques.
- natural sciencescomputer and information sciencesdatabases
- social scienceslawlaw enforcement
- natural sciencescomputer and information sciencesartificial intelligencecomputer vision
- engineering and technologyelectrical engineering, electronic engineering, information engineeringinformation engineeringtelecommunicationsmobile phones
- natural sciencescomputer and information sciencesartificial intelligencemachine learningdeep learning
You need to log in or register to use this function
Programme(s)
Funding Scheme
MSCA-IF-EF-ST - Standard EFCoordinator
NG7 2RD Nottingham
United Kingdom