Machine Learning of Speech Recognition Models for Controller Assistance

Projektinformationen

MALORCA

ID Finanzhilfevereinbarung: 698824

Projektwebsite

DOI

10.3030/698824

Projekt abgeschlossen

EK-Unterschriftsdatum 30 März 2016

Startdatum 1 April 2016

Enddatum 31 März 2018

Finanziert unter

SOCIETAL CHALLENGES - Smart, Green And Integrated Transport

Gesamtkosten

€ 805 587,50

EU-Beitrag

€ 538 103,75

538 103,75

267 483,75

Koordiniert durch

DEUTSCHES ZENTRUM FUR LUFT - UND RAUMFAHRT EV
Germany

Periodic Reporting for period 4 - MALORCA (Machine Learning of Speech Recognition Models for Controller Assistance)

Berichtszeitraum: 2017-10-01 bis 2018-03-31

Artificial intelligence (AI) and in particular machine learning applications have made a significant progress in last few years, enabling computers to make a series of major breakthroughs that were previously impossible. One of the successful fields is automatic speech recognition (ASR), which has shown remarkable improvements in understanding human conversational speech. ASR of air traffic controller to pilot communication is a natural follow up. MALORCA built its expertise on following projects:
• AcListant®: DLR and Saarland University have shown that command recognition rates of 95% are possible, with Assistant Based Speech Recognition (ABSR), which combines an ASR system with a controller assistant system.
• In the follow up project AcListant®-Strips: DLR, DFS and Austro Control validated the AcListant® ABSR system for Dusseldorf approach area. It has been observed that controller’s workload can be significantly reduced by employing ABSR in their work. This has led to conclusions that the arrival throughput could be increased by two landings per hour for this airport

Although ABSR has clearly shown to be advantageous in ATM, its deployment usually requires large amount of manually transcribed data and significant expert effort to adapt the basic ABSR system to target-domain allowing achieving task-sufficient recognition accuracies. MALORCA project proposed to overcome the need for data and significant expert knowledge by employing novel machine learning algorithms on bi-modal data allowing adapting the initial basic ABSR system in a semi-automatic way to target-domain (i.e. Prague and Vienna approach). The algorithms can rely on two independent information sources:
1. acoustic information are combined with
2. scores extracted from radar data.

MALORCA’s objectives are to
• Provide speech recognition tools for different deployment areas
• Improvement of command recognition error rate by machine learning
• Develop a multi-modal, state-of-the-art, automatic learning system
• Bringing together experts from multiple disciplines.

More than 100 hours of radar data and utterance from controller pilot communication for Vienna and for Prague approach area were recorded (WP2). 20% of these speech recordings were manually transcribed, i.e. an ATC experts listens to the controller pilot communication and writes down word by word what was said and what are the ATC relevant elements: “good morning” e.g. is not important. We need to know the callsign, whether we have a DESCEND or a REDUCE command and the command value e.g. 8000 feet or 220 knots. All in all we transcribed four hours without silence for both Vienna and Prague.
An initial basic ABSR was set-up for both Prague and also for Vienna (WP3). MALORCA achieved a command recognition rate of approximately 80% (Prague) respectively 60% (Vienna). We added then 25% of the untranscribed data to improve the models through machine learning framework. The system performance has significantly increased. We have then plugged in another set of untranscribed data (to reach 50%, 75% and 100% of the total set) in order to emulate the learning effect on monthly basis. Command recognition rates have eventually increased to 92% (Prague) respectively 83% (Vienna).
The performance of the trained ABSR system was evaluated on proof-of-concept trials by nine controllers in Vienna and Prague in end of January 2018 (WP5). These trials overstep the initial objective and allow the end-users, the controllers, to put their hands on the live-mode platform with basic HMI. The performed work does not cover only the objectives of MALORCA projects to develop a basic adaptable ABSR system and to improve it by unsupervised learning, but it goes beyond and provides the clear heritage of MALORCA project to SESAR2020 project PJ.16-04. Received feedback of end-users together with an Operational Concept Document and System Requirement Specification from WP1 clearly specifies controllers’ preferences in the domain of speech recognition.
Several new challenges were tackled in MALORCA
- 8 kHz sampling rate, instead 16 kHz
- very noisy speech environment (i.e. low speech to noise ratio)
- high deviations of ATC controllers from standard phraseology
- relatively small amount of in-domain data available (i.e. recordings from controller pilot communication). Currently 45 hours of speech recordings are available. For comparison, Google’s speech recognizer is based on 300,000 hours of speech samples.
- dealing with some data elements which do deviate from the expectations the grant/proposal was based on
- experts from ATM industry and research as well as from ATM and Speech Recognition come together speaking different domain languages.

More information: http://www.malorca-project.de and http://www.aclistant.de

In order to reach the high command recognition rates and low command recognition error rates, full-fledged Assistant Based Speech Recognition is required: A normal Speech Recognizer transforms the speech signal recorded by a microphone to a sequence of words. Assistant Based Speech Recognition developed by DLR, Idiap and Saarland University uses the output of an Arrival Manager to predict a set of controller commands which are possible in the current situation (i.e. called as situational context), where radar data is used as a second sensor. This approach can firstly significantly reduce the search space of the speech recognizer, correct the ASR hypotheses and can also be used for plausibility checking of the output of the speech recognizer. For Prague approach, the developed ABSR yields the command recognition error rates below 0.6% and for Vienna below 3.8%. Prague results are generally better than Vienna results (especially due to better audio quality, SNR 5 dB difference).
MALORCA developed a very novel approach, which was not applied before in Automatic Speech Recognition domain (WP4) since MALORCA’s learning algorithms can rely on two independent information sources: (1) acoustic scores, which are then combined with (2) scores extracted from the situational context, provided by radar data available for each automatically transcribed utterance
MALORCA proved for Prague and Vienna approach area that unsupervised learning is able to notably improve command recognition rate and that automatic learning from radar data and voice recordings can reduce costs of data, speeds up development and reduce manual adaptation effort.

Today Austro Control operating fully paperless requiring manual inputs via mouse

MALORCA adapts its recognition models by by employing novel machine learning algorithms

Components of ABSR to transform acoustic speech signals into radar screen inputs

Periodic Reporting for period 4 - MALORCA (Machine Learning of Speech Recognition Models for Controller Assistance)

Diese Seite teilen Diese Seite in sozialen Netzwerken teilen

Herunterladen Den Inhalt der Seite herunterladen