Speech Processing and Recognition Using Integrated Neurocomputing Techniques

Objectif

The aim of the SPRINT Action was to examine whether connectionist techniques can be used to improve the current performance of automatic speech recognition systems, with particular respect to speaker independence and noise insensitivity.
Answers were sought to the following questions:
-How can the recogniser be provided with robust features adapted to new speakers or environments?
-How can acoustic parameters be mapped onto phonetic symbols using different neural network paradigms?
-How can the competitive learning approach be applied to high-level speech processing in order to understand the lexicon structures?
-How can isolated words be recognised, considering the problems related to time-varying word patterns and noise immunity?
Various unsolved problems in speech recognition were tackled by exploring the distinctive features of neural networks (eg, non-linearity, self-organisation, parallelism) to upgrade the performance of automatic speech recognition systems. The connectionist paradigms were exploited to investigate some of the problems in relationship with speech variabilities: adaption to new speakers and/or new environments, noise immunity, classification of speech parameters using a set of phonetic symbols, and recognition of isolated words (lexical access). Theoretical studies have been conducted to establish the capabilities of various neural networks to generate any spectral transformation. For each network architecture, the ability to discriminate between several classes during classification was also evaluated. The evaluation of speaker adaption procedures based on learning spectral transformation with multilayer perceptrons was conducted. Well established methods were compared. The use of neural networks to carry out the transformations of speech parameters, necessary for recognition that is robust with respect to speech signals contaminated by background noise was investigated. Preliminary experiments have been carried out. Evaluation of very simple structures of perceptions: sparsely connected neural nets (neural nets with different topologies: local connection (TDNN), scally or fully connected) for spectra and isolated work classification was conducted. These experiments showed the need for specifically designed networks. Experiments with LVQ alone, a TDNN-derived network alone and combined TDNN-LVQ architectures proved the combined architecture to be the most efficient. An examination was made of the scally network topology for noise robustness and for use with various approaches to time alignment. The use of added noise in training to improve generalisation was investigated. An information theoretic distance metric was developed, together with a multilayer perception which has outputs with a probabilistic interpretation.
APPROACH AND METHODS
The speech representation levels considered are signal, parameter, phonetic and lexical. The main areas of investigation were:
-The transition from the signal to the parameter level and transitions within the parameter level: research will provide the recognition system with a set of parameters leading to the best performance. Transformations of classical speech representationsare investigated, based on multi-layer perceptrons, topological maps, and the learning vector quantification method.
-The transition from the parameter to the phonetics level: various feed-forward neural network topologies have been assessed, and some found to integrate prior knowledge.
-The transition from the phonetic to the (sub)lexical level: the competitive learning approach is used to determine the structure of the lexicon and the structure between morpheme units and the phonemes.
-The transition from the parameter to the lexical level: the ability of various network paradigms to learn to generalise has been examined in order to deal with intra and inter-speaker variability and background noise. The problem of recognising time-var ying speech patterns has been approached by transforming the speech signal to fit the fixed size network input layer. Architectures and hybrid systems that integrate neural networks with well-established approaches are used.
PROGRESS AND RESULTS - STATUS OF OCTOBER 1991
The available deliverables report on the following research activities:
-Theoretical studies have been conducted to establish the capabilities of various neural networks to generate any spectral transformation. For each network architecture, the ability to discriminate between several classes during classification was also e valuated.
-The evaluation of speaker adaptation procedures based on learning spectral transformation with multi-layer perceptrons. Well-established methods were compared.
-The use of neural networks to carry out the transformations of speech parameters, necessary for recognition that is robust with respect to speech signals contaminated by background noise. Preliminary experiments have been carried out.
-Evaluation of very simple structures of perceptions: SPARSELY connected neural nets (Neural nets with different topologies: local connection (TDNN), scally or fully connected) for spectra and isolated word classification is conducted. These experimentsshow the need for specifically designed networks. Experiments with LVQ alone, a TDNN-derived network alone and combined TDNN-LVQ architectures proved the combined architecture to be the most efficient.
-An examination of the scally network topology for noise robustness and for use with various approaches to time alignment.
-The use of added noise in training to improve generalisation.
-The development of an information theoretic distance metric together with a multilayer perception which has outputs with a probabilistic interpretation, unifying the probabilistically forma hidden Markov modelling techniques and multilayer perception ap proaches, and leading to the development of an HMM-MLP hybrid.
POTENTIAL
The basic know-how acquired and the tools developed will be used in the next step to meet the challenge of integrating these techniques within an automatic speech recogniser. Furthermore, this work will be exploited in other areas such as pattern recognition (image, fonts, characters).

Champ scientifique (EuroSciVoc)

CORDIS classe les projets avec EuroSciVoc, une taxonomie multilingue des domaines scientifiques, grâce à un processus semi-automatique basé sur des techniques TLN. Voir: Le vocabulaire scientifique européen.

Programme(s)

Programmes de financement pluriannuels qui définissent les priorités de l’UE en matière de recherche et d’innovation.

FP2-ESPRIT 2 - European strategic programme (EEC) for research and development in information technologies (ESPRIT), 1987-1992

Thème(s)

Les appels à propositions sont divisés en thèmes. Un thème définit un sujet ou un domaine spécifique dans le cadre duquel les candidats peuvent soumettre des propositions. La description d’un thème comprend sa portée spécifique et l’impact attendu du projet financé.

Données non disponibles

Appel à propositions

Procédure par laquelle les candidats sont invités à soumettre des propositions de projet en vue de bénéficier d’un financement de l’UE.

Données non disponibles

Régime de financement

Régime de financement (ou «type d’action») à l’intérieur d’un programme présentant des caractéristiques communes. Le régime de financement précise le champ d’application de ce qui est financé, le taux de remboursement, les critères d’évaluation spécifiques pour bénéficier du financement et les formes simplifiées de couverture des coûts, telles que les montants forfaitaires.

Données non disponibles

Coordinateur

CAP Gemini Innovation

Contribution de l’UE

Aucune donnée

Adresse

7 chemin du Vieux Chêne
38240 Meylan
France

Coût total

Aucune donnée

Participants (5)

Alcatel SEL AG

Allemagne

Contribution de l’UE

Aucune donnée

Adresse

Lorenzstraße 10
70435 Stuttgart

Coût total

Aucune donnée

Defence Research Agency (DRA)

Royaume-Uni

Contribution de l’UE

Aucune donnée

Adresse

St Andrews Road
WR14 3PS Malvern

Coût total

Aucune donnée

IRIAC

France

Contribution de l’UE

Aucune donnée

Adresse

10 RUE ANDRE VANDREZANNE
75013 PARIS

Coût total

Aucune donnée

TELECOM PARIS

France

Contribution de l’UE

Aucune donnée

Adresse

46 RUE BARRAULT
75634 PARIS

Coût total

Aucune donnée

UNIVERSITAT POLITECNICA DE MADRID

Espagne

Contribution de l’UE

Aucune donnée

Adresse

CAMPUS DE MONTEGANCEDO
28660 MADRID

Coût total

Aucune donnée

Objectif

Champ scientifique (EuroSciVoc)

CORDIS classe les projets avec EuroSciVoc, une taxonomie multilingue des domaines scientifiques, grâce à un processus semi-automatique basé sur des techniques TLN. Voir: Le vocabulaire scientifique européen.

Programme(s)

Programmes de financement pluriannuels qui définissent les priorités de l’UE en matière de recherche et d’innovation.

Thème(s)

Les appels à propositions sont divisés en thèmes. Un thème définit un sujet ou un domaine spécifique dans le cadre duquel les candidats peuvent soumettre des propositions. La description d’un thème comprend sa portée spécifique et l’impact attendu du projet financé.

Appel à propositions

Procédure par laquelle les candidats sont invités à soumettre des propositions de projet en vue de bénéficier d’un financement de l’UE.

Coordinateur

Participants (5)

Partager cette page Partager cette page sur les réseaux sociaux

Télécharger Télécharger le contenu de la page

Speech Processing and Recognition Using Integrated Neurocomputing Techniques

Objectif

Champ scientifique (EuroSciVoc) CORDIS classe les projets avec EuroSciVoc, une taxonomie multilingue des domaines scientifiques, grâce à un processus semi-automatique basé sur des techniques TLN. Voir: Le vocabulaire scientifique européen.

Programme(s) Programmes de financement pluriannuels qui définissent les priorités de l’UE en matière de recherche et d’innovation.

Thème(s) Les appels à propositions sont divisés en thèmes. Un thème définit un sujet ou un domaine spécifique dans le cadre duquel les candidats peuvent soumettre des propositions. La description d’un thème comprend sa portée spécifique et l’impact attendu du projet financé.

Appel à propositions Procédure par laquelle les candidats sont invités à soumettre des propositions de projet en vue de bénéficier d’un financement de l’UE.

Coordinateur

Participants (5)

Partager cette page Partager cette page sur les réseaux sociaux

Télécharger Télécharger le contenu de la page

Champ scientifique (EuroSciVoc)

CORDIS classe les projets avec EuroSciVoc, une taxonomie multilingue des domaines scientifiques, grâce à un processus semi-automatique basé sur des techniques TLN. Voir: Le vocabulaire scientifique européen.

Programme(s)

Programmes de financement pluriannuels qui définissent les priorités de l’UE en matière de recherche et d’innovation.

Thème(s)

Les appels à propositions sont divisés en thèmes. Un thème définit un sujet ou un domaine spécifique dans le cadre duquel les candidats peuvent soumettre des propositions. La description d’un thème comprend sa portée spécifique et l’impact attendu du projet financé.

Appel à propositions

Procédure par laquelle les candidats sont invités à soumettre des propositions de projet en vue de bénéficier d’un financement de l’UE.