Community Research and Development Information Service - CORDIS

H2020

MALORCA Report Summary

Project ID: 698824
Funded under: H2020-EU.3.4.7.1

Periodic Reporting for period 2 - MALORCA (Machine Learning of Speech Recognition Models for Controller Assistance)

Reporting period: 2016-10-01 to 2017-03-31

Summary of the context and overall objectives of the project

In Air Traffic Control instructions are usually still given via voice communication to the pilots. But modern computer systems in Air Traffic Control, to be safe and efficient, need up-to-date data. Therefore, it requires lots of inputs from the air traffic controllers (ATCOs), which are done today via mouse, to keep the system data correct. Modern technologies like Air-Ground data link, which in some cases can replace the voice communication, will require even more inputs from the ATCOs.
This generates workload for the ATCO, which Speech Recognition Technology will be able to reduce significantly. Simulations have shown that the usage of modern speech recognition will result in an increased sector- and landing-capacity. And furthermore, this will lead to reduced flight time which will lower the airlines costs and has positive environmental impact, because it will save 50 to 65 litres of fuel consumption per flight. For a medium airport with 500 landings per day this can result in more than 23 million kilograms of C02 savings. Speech Recognition Technology today has reached a level of reliability that is sufficient for implementation into an ATM-system. This became obvious from the perspective of Air Navigation Service Provider when supporting trials in course of the AcListant® project. ANS CR, Austro Control, Croatia Control, DFS, Irish Aviation Authority, Naviair, and LFV already participated at least with one controller in experiment with AcListant® Assistant Based Speech Recognizer in DLR’s labs.

One main issue to transfer Speech recognition from the laboratory to the operational systems are the costs of deployment. Currently, modern models of speech recognition require manual adaptation to a local environment. MALORCA project proposes a general, cheap and effective solution to automate this re-learning, adaptation and customisation process. So MALORCA gives industry a practical way of development and deployment of this state-of-the art speech recognition system and its integration in today’s voice communication systems of air navigation service providers.

Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far

MALORCA has collected speech and radar data from Vienna and Prague approach (more than 100 hours with silence each resulting in 22 hours without silence each). A basic Arrival Manager was developed for Vienna and Prague which enables to predict command hypotheses for each controller command spoken to the pilot. The speech data were transcribed (speech-to-text) and annotated (text-to-relevant concepts, e.g. call sign, command type, command value; greetings and other information elements which are not relevant for input into radar labels (e.g. weather information) are not considered).. An Operational Concept Document was created which clearly specifies controllers' preferences to benefit from applying speech recognition in air traffic management. The Operational Concept Document together with the annotated speech data provided an input for creating the System Requirement Specification. A basic recognition system has been implemented to be used in the following reporting periods for developing and testing the automatic learning algorithms. Only 30% of the recorded speech data are manually transcribed, i.e. they are usable for supervised learning. The other 70%, however, are automatically transcribed using the basic recognition system. 70% correspond to approximately 30 hours of speech data, which is compared to Google’s data bases using more than 200,000 hours of speech data only a drop in the bucket. However, for each automatically transcribed utterance the output of the Arrival Manager is available, resulting in a limited set of possible commands in each situation. This will help to classify the automatically transcribed utterances into good and bad transcriptions.

Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far)

AcListant® (www.aclistant.de), which developed, validated and quantified benefits of Assistant Based Speech Recognition has always exploited speech and radar data from the simulator. MALORCA project is built around the radar and speech data obtained directly from recordings from the ops room in Prague and Vienna. AcListant® had also direct access to the speech signal during the simulations and, therefore was able to capture speech data that was recorded directly from the microphone. In MALORCA, however, there was no possibility to record speech data in a similar fashion given the strict safety policies of the ops-room. Instead, we agreed to use radio transmission speech data which is recorded for archiving and incidents feedback (not generally intended to be used by another system. Hence, the quality of speech (i.e. significant drop of SNR) has significantly decreased which has a direct impact of resulting performance.

Therefore, several new challenges are tackled which are mainly related to:
- 8 kHz sampling rate, instead 16 kHz
- very noisy speech environment (i.e. low speech to noise ratio)
- high deviations of ATC controllers from standard phraseology
- relatively small amount of in-domain data available (i.e. recordings from controller pilot communication). Currently 40 hours of speech recordings are available. For comparison, Google’s speech recognizer is based on 200,000 hours of speech samples.
- dealing with some data elements which do deviate from the expectations the grant/proposal was based on
- experts from ATM industry and research as well as from ATM and Speech Recognition come together speaking different domain languages.

However, for each automatically transcribed utterance, the output of the Arrival Manager is available, resulting in a limited set of possible commands in each situation. This will help to classify the automatically transcribed utterances into good and bad transcriptions.
Follow us on: RSS Facebook Twitter YouTube Managed by the EU Publications Office Top