Periodic Reporting for period 4 - MALORCA (Machine Learning of Speech Recognition Models for Controller Assistance)
Berichtszeitraum: 2017-10-01 bis 2018-03-31
• AcListant®: DLR and Saarland University have shown that command recognition rates of 95% are possible, with Assistant Based Speech Recognition (ABSR), which combines an ASR system with a controller assistant system.
• In the follow up project AcListant®-Strips: DLR, DFS and Austro Control validated the AcListant® ABSR system for Dusseldorf approach area. It has been observed that controller’s workload can be significantly reduced by employing ABSR in their work. This has led to conclusions that the arrival throughput could be increased by two landings per hour for this airport
Although ABSR has clearly shown to be advantageous in ATM, its deployment usually requires large amount of manually transcribed data and significant expert effort to adapt the basic ABSR system to target-domain allowing achieving task-sufficient recognition accuracies. MALORCA project proposed to overcome the need for data and significant expert knowledge by employing novel machine learning algorithms on bi-modal data allowing adapting the initial basic ABSR system in a semi-automatic way to target-domain (i.e. Prague and Vienna approach). The algorithms can rely on two independent information sources:
1. acoustic information are combined with
2. scores extracted from radar data.
MALORCA’s objectives are to
• Provide speech recognition tools for different deployment areas
• Improvement of command recognition error rate by machine learning
• Develop a multi-modal, state-of-the-art, automatic learning system
• Bringing together experts from multiple disciplines.
An initial basic ABSR was set-up for both Prague and also for Vienna (WP3). MALORCA achieved a command recognition rate of approximately 80% (Prague) respectively 60% (Vienna). We added then 25% of the untranscribed data to improve the models through machine learning framework. The system performance has significantly increased. We have then plugged in another set of untranscribed data (to reach 50%, 75% and 100% of the total set) in order to emulate the learning effect on monthly basis. Command recognition rates have eventually increased to 92% (Prague) respectively 83% (Vienna).
The performance of the trained ABSR system was evaluated on proof-of-concept trials by nine controllers in Vienna and Prague in end of January 2018 (WP5). These trials overstep the initial objective and allow the end-users, the controllers, to put their hands on the live-mode platform with basic HMI. The performed work does not cover only the objectives of MALORCA projects to develop a basic adaptable ABSR system and to improve it by unsupervised learning, but it goes beyond and provides the clear heritage of MALORCA project to SESAR2020 project PJ.16-04. Received feedback of end-users together with an Operational Concept Document and System Requirement Specification from WP1 clearly specifies controllers’ preferences in the domain of speech recognition.
Several new challenges were tackled in MALORCA
- 8 kHz sampling rate, instead 16 kHz
- very noisy speech environment (i.e. low speech to noise ratio)
- high deviations of ATC controllers from standard phraseology
- relatively small amount of in-domain data available (i.e. recordings from controller pilot communication). Currently 45 hours of speech recordings are available. For comparison, Google’s speech recognizer is based on 300,000 hours of speech samples.
- dealing with some data elements which do deviate from the expectations the grant/proposal was based on
- experts from ATM industry and research as well as from ATM and Speech Recognition come together speaking different domain languages.
More information: http://www.malorca-project.de(öffnet in neuem Fenster) and http://www.aclistant.de(öffnet in neuem Fenster)
MALORCA developed a very novel approach, which was not applied before in Automatic Speech Recognition domain (WP4) since MALORCA’s learning algorithms can rely on two independent information sources: (1) acoustic scores, which are then combined with (2) scores extracted from the situational context, provided by radar data available for each automatically transcribed utterance
MALORCA proved for Prague and Vienna approach area that unsupervised learning is able to notably improve command recognition rate and that automatic learning from radar data and voice recordings can reduce costs of data, speeds up development and reduce manual adaptation effort.