CORDIS - Forschungsergebnisse der EU
CORDIS

Training Network on Automatic Processing of PAthological Speech

Periodic Reporting for period 2 - TAPAS (Training Network on Automatic Processing of PAthological Speech)

Berichtszeitraum: 2019-11-01 bis 2022-09-30

The TAPAS project is developing speech processing technologies to transform diagnosis, treatment and well-being of groups of people for whom spoken communication is extremely challenging. These include people with impairments in speech production (resulting in slurred and unintelligible speech, e.g. dysarthric speech), people with impairments in speech perception (e.g. due to hearing loss), and people with impaired language processing (e.g. due to reduced cognitive abilities). The field of speech processing has largely evolved from the perspective of typical speech (speech from healthy individuals with no impairment). As a consequence, existing methods and technologies are not well equipped to handle or are not robust to atypical speech resulting due to these impairments. As illustrated in the figure, through multi- and inter-disciplinary R&D lying at the intersection of speech processing, linguistics and clinical science, the project plans to:
a) develop inexpensive and non-invasive tools for early diagnosis of conditions such as, dementia, depression, Parkinson's disease based on speech processing;
b) develop automated personalized tools for rehabilitative therapy to recover speech function for groups of people having speech disorder. For instance, due to head and neck cancer surgery, dysarthria; and
c) re-design current speech technology to work with people with speech impairments for assisted living and also to help in making informed clinical choices.
In that context, the TAPAS network is training 15 early stage researchers (young researchers) equipped with multi- and inter-disciplinary skills lying at the cross-section of speech processing, clinical practice and industry.
The research and development in the TAPAS project are organized into three parts:
I. Pathological speech detection
The activities under this research direction have focused on: (a) development of novel neural architectures to capture word-level information and sentence-level information embedded in manual transcript and automatically generated transcripts for dementia detection, (b) development of a multi-instance learning framework for speech-based assessment, (c) integrating prior knowledge about speech production in raw waveform modeling neural networks to assess pathological speech and (d) development of deep learning-based approaches for sensing breathing signal and breathing parameters from the speech signal.
II. Pathological speech assessment and therapy
The activities under this research direction have focused on: (a) investigating different neural network architectures and training methods together with transfer learning to robustly estimate phonological features for pathological speech intelligibility assessment, (b) analyzing the internal representations learned by deep neural networks trained for speech recognition task for objectively assessing head and neck cancer voice intelligibility, (c) development of automatic methods for speech-based evaluation of Parkinson's disease and integrating those evaluations with movement information captured through inertial sensors for holistic assessment of neurological state of the patients following the Unified Parkinson’s Disease Rating Scale, (d) determining deviations in pathological speech that have most impact on speech intelligibility and finding good procedures for measuring intelligibility, with reduced workload, (e) conducting literature survey, online survey and expert interviews for development of a clinician-friendly low-level segmental acoustic measures-based speech assessment tool, and (f) development of speech data, exercises, linguistic targets and methods for development of a virtual articulation therapist, to guide patients through intensive treatment program for improving articulation and, consequently speech intelligibility.
III. Communication technologies for assisted living and rehabilitation
The activities in this research direction have focused on: (a) development of an articulatory to acoustic inversion system for demonstration of phenomena of pathological speech and collection of publicly available oral cancer speech data for explaining the difference between oral cancer speech and healthy control speech, (b) investigating the influence of language model trained with in-domain data and out-of-domain data on dysarthric speech recognition, (c) application of state-of-the-art sequence discriminative training methods for acoustic modeling and analysing the errors made by the automatic speech recognition system, when recognizing control speech and dysarthric speech, (d) improving recognition of children speech through transfer learning and data augmentation for children pathological speech assessment in the context of gamified speech therapy sessions and (e) development of tools to automatically assess speech production deficits/problems of cochlear implant users, such as through evaluation of articulation deficits at consonant-vowel and vowel-consonant transitions, developing models to distinguish between healthy speakers' speech and cochlear implant users' speech, developing methods to distinguish between "Pre-lingual" speech and "Post-lingual" speech.
Beside peer reviewed conference and journal publications, these activities have also resulted in open source software or tools such as, Phonet, Apkinson. The TAPAS project consortium has organized three training events: (i) Speech Pathologies and Therapies, (ii) Speech processing and machine learning and (iii) Data collection, management and ethical practices.
With a focus on pathological speech processing, the TAPAS project is advancing development of,
a) speech processing approaches that seamlessly interface speech production phenomena (movement of articulators) with speech perception phenomena (perceived linguistic and paralinguistic information), such that the changes in one phenomenon can be transparently related to changes in another.
b) clinically valid methods for speech assessment and therapy. For instance, phonological feedback for articulation therapy. Automatic speech assessment methods that will allow for a multidimensional and reliable documentation of speech of cochlea implant users and future use for scientific and rehabilitation purpose.
c) assistive technologies for speech impaired. For instance, children speech therapy in gaming environment.
d) methods and tools for progressive and longitudinal speech-based assessment. For example, speech-based motor level assessment of progression and treatment of Parkinson's disease.
Beside that, TAPAS has initiated a new direction of research on understanding the relationship between respiration and speech production through estimation of breathing patterns from speech signal. This research direction is garnering further interest in the current pandemic situation of COVID-19.
The project is dealing with various speech pathologies resulting due to Dementia, Parkinson’s, Hearing loss, Dysarthria, Oral cancer surgery and treatment, Depression, Bipolar disorder, Cleft lip and palate to name a prominent few. Thanks to the project TAPAS, if successful, the project will provide robust practitioner-friendly tools for detection, assessment and treatment of pathological speech, and will contribute to the development of speech and language technologies for assisted living, care and rehabilitation of individuals with chronic speech pathologies.
tapas-multi-disciplinary.png