European Commission logo
español español
CORDIS - Resultados de investigaciones de la UE

Investigating Speech Processing in Realistic Environments

Final Report Summary - INSPIRE (Investigating Speech Processing in Realistic Environments)

INvestigating Speech Processing In Realistic Environments

In everyday life, people listen to speech under a wide range of conditions that are “sub-optimal” relative to the controlled conditions in laboratory experiments. To understand how people deal with such sub-optimal circumstances, classical research methods have typically considered these “adverse” conditions in isolation. Also, listener behaviour was typically characterized by describing the average behaviour of subject groups as a whole. While each type of adverse condition can have important consequences on its own, it is often the combination of conditions that conspire to create serious communication problems especially for elderly and hearing-impaired individuals. Moreover, there appears to be a huge variability in the ways how individuals cope with every-day challenges. This implies that it is highly desirable to tailor future communication devices, hearing aids, cochlear implants, but also radios and TVs in such a way that they become adaptable to the individual needs of listeners in different acoustic circumstances. In order to be able to do so, however, a better insight is needed into the many factors that determine speech intelligibility. This in turn requires a joint effort of the many sub-disciplines that were hitherto working on speech communication research in relative isolation.
Investigating Speech Processing in Realistic Environments (INSPIRE), a Marie Curie Initial Training Network that was active from 1-1-2012 to 31-12-2015, aimed to create a community of researchers who can take advantage of synergies between the sub-disciplines that investigate individual aspects of speech communication. A team of PhD students and post doctoral researchers (INSPIRE fellows) was brought together with leading academic scientists from core disciplines in speech communication, R&D researchers from leading businesses in acoustics and hearing instruments, and hospital-based ENT specialists to actively do scientific research on four different themes. Each theme was covered by a separate work package (WP):
WP1: Accommodation aimed to specify key mechanisms in which listeners' and speakers' perceptual and interpretive processes change on a short-term basis while processing speech in realistic listening conditions. For instance, fellows collected data about how background noise may interfere with the way in which speech is stored in lexical memory, and how the amount of effort (measured both behaviourally and with EEG) appears to be a combined effect of both the familiarity with a language/accent and the noisy condition. Also, an ultrasound based detector was developed as a first step towards creating a modality by means of which technological applications can adapt to degraded speech in noise.
WP2: Individual Differences aimed to characterize the way in which listeners vary in terms of their hearing ability, native language, working memory capacity, etc., and how they may differ in their responses to real-world speech (as opposed to characterizing average responses of larger populations of listeners). Topics addressed concerned the development of laboratory speech tests that can simulate relevant aspects of real-world communication scenarios and reliably reproduce inter-individual differences in the performance of the hearing impaired population, the development of psychophysical measurement methods which can characterize and identify the processing deficits that account for the observed inter-individual differences. Also, it was investigated how age and/or hearing loss influence the listeners’ ability to decode affective information in conversational speech and to what extent hearing aids restore the perception of affective information in the speech signal. Individual differences in sensitivity to masking was investigated by studying how the perception of a target speech utterance in the presence of speech of a competing talker is affected by various properties of the utterance of target and competing talker.
WP3: Microscopic Intelligibility Prediction aimed to develop intelligibility models that predict misperceptions in specific sub-optimal circumstances. Until recently, the goal of intelligibility models has been to provide average (i.e. macroscopic) numerical estimates of overall intelligibility in the face of slowly-varying additive noise or reverberation. Using a recently collected corpus of words in noise that were consistently misperceived by many listeners, the misperceptions were analyzed to find the underlying signal properties that most likely induced these confusions. Based on the results, existing macroscopic models were evaluated and refined into more detailed, microscopic models. Along similar lines microscopic models were also developed by explaining misperceptions from a binaural point of view. Other topics that have been worked on are a microscopic model for better prediction of consonant perception and gathering more detailed information on the extent to which cochlear implant users can benefit from periodicity in the target speech or the background noise.
WP4: Engineering Robustness aimed to reduce the fragility of current automatic speech recognition (ASR) systems and make them less vulnerable to noise and reverberation. By relying more explicitly on knowledge about human speech processing, it is expected that eventually the behaviour of ASR in adverse conditions can be made more human-like, both in terms of performance levels and in terms of the type of recognition errors. Using features from an auditory model, a superior speech-noise separation algorithm was developed. When coupled to an existing ASR system (using conventional acoustic features) it could successfully improve its noise robustness. In addition, it was investigated how implementation details of an auditory model affect recognition performance when auditory features are used directly for recognition purposes. .
Finally, to facilitate collaboration between the above mentioned, highly multi-disciplinary research fields, a fifth work package focused on a sustainable research infrastructure.
WP5: network-wide support activities was organized in three short-term (12 months) projects and carried out by experienced researchers (ERs). These ER projects aimed to provide links and synergy between the early stage researcher (ESR ) projects of WP1-WP4 by developing common data sets and by sharing tools. Moreover, to stimulate collaboration in the near future, INSPIRE was used as a development/test platform to initiate a permanent collection of measurement data and tools that is made accessible to the scientific community.
Twice a year, workshops were organized for all network participants. In conjunction with some of these workshops, three, yearly winter schools were organized that were also open to external PhD students. During all events, the INSPIRE fellows were offered a training programme comprised of a mix of theme training, key stone courses, and complementary skills training. Each event was closed by an additional fellow’s day, organized by and for the fellows, which helped to strengthen the social cohesion of the fellow’s group. All these activities have stirred active and exciting collaboration between parties that –without the existence of INSPIRE– probably would not have occurred at all and that is expected to persist for many years to come.
Scientific results of INSPIRE come in different forms. First , the project resulted in over 50 scientific papers, amongst which over 15 articles in renowned journals. Further, we mention the initiatives that were aimed at a sustainable research infrastructure: (1) the BigListen initiative , (2) the speech stimuli synthesis software , and (3) the INSPIRE challenge, providing a testing platform to facilitate and advance auditory modelling efforts that are more focused towards individual differences. All initiatives have been actively advertised at Interspeech conferences. The first tangible results already become visible in the form of a special session that is scheduled for the Interspeech 2016 conference and for which the first data for English and Spanish, collected with BigListen software has been made available to the scientific community.
With the help of the INSPIRE fellows, an international SPIRE workshop was organized in Groningen at the end of the INSPIRE project. In conjunction with the Speech in Noise 2016 workshop, a substantial number of researchers with different scientific backgrounds convened to exchange the latest results and insights in the area covered by INSPIRE.
To raise awareness amongst the general public for the importance of the research conducted by the INSPIRE network, a highly successful public event was organized in collaboration with The Royal Institution of Great Britain with the title “Good listeners and smooth talkers: Spoken communication in a challenging world” .
Also the BigListen initiative, a crowd sourcing web-application to collect multi-lingual information on how individual subjects are similar in their mis-hearings when they listen to words in noise, is instrumental in involving the general public. Not only valuable scientific data are obtained this way, it also increases the visibility of the research field covered by INSPIRE.
Finally, we mention the benefit from collaboration with industry. Besides the broadened view and personal growth that INSPIRE fellows gained from it, it also resulted in two patent applications, illustrating the successful cross-fertilisation between engineering and auditory modelling.
More detailed information on the obtained results can be inferred from the publication list on the project website or be obtained from the coordinator.