Community Research and Development Information Service - CORDIS

H2020

MALORCA Report Summary

Project ID: 698824
Funded under: H2020-EU.3.4.7.1

Periodic Reporting for period 1 - MALORCA (Machine Learning of Speech Recognition Models for Controller Assistance)

Reporting period: 2016-04-01 to 2016-09-30

Summary of the context and overall objectives of the project

In Air Traffic Control instructions are usually still given via voice communication to the pilots. But modern Computer systems in Air Traffic Control, to be safe and efficient, need up-to-date data. Therefore, it requires lots of inputs from the air traffic controllers (ATCOs), which are done today via mouse, to keep the system data correct. Modern technologies like Air-Ground data link, which in some cases, can replace the voice communication will require even more inputs from the ATCOs.
This generates workload for the ATCO, which Speech Recognition Technology will be able to reduce significantly. Simulations have shown that the usage of modern speech recognition will result in an increased sector- and landing-capacity. And furthermore, this will lead to reduced flight time which will lower the airlines costs and has positive environmental impact, because it will save 50 to 65 litres of fuel consumption per flight. For a medium airport with 500 landings per day this will result in more than 23 million kilograms of C02 savings. Speech Recognition Technology today reached a level of reliability that is sufficient for implementation into an ATM-system. This became obvious from the perspective of an Air Navigation Service Provider when supporting trials in course of the AcListant® project.
One main issue to transfer Speech recognition from the laboratory to the operational systems are the costs of deployment. Currently, modern models of speech recognition require manual adaptation to a local environment. MALORCA project proposes a general, cheap and effective solution to automate this re-learning, adaptation and customisation process. So MALORCA gives industry a practical way of development and deployment of this state-of-the art speech recognition system and its integration in today’s voice communication systems of air navigation service providers.

Work performed from the beginning of the project to the end of the period covered by the report and main results achieved so far

MALORCA has collected speech and radar data from Vienna and Prague approach (approx. 100 hours each). A basic Arrival Manager was developed for Vienna and Prague which enables to predict command hypotheses for each controller command spoken to the pilot. The speech data were transcribed (speech-to-text) and annotated (text-to-relevant concepts, e.g. call sign, command type, command value; greetings and other information elements which are not relevant for input into radar labels (e.g. weather information) are not considered). An Operational Concept Document was created which clearly specifies controllers' preferences to benefit from applying speech recognition in air traffic management. The Operational Concept Document together with the annotated speech data provided an input for creating the System Requirement Specification. A basic recognition system has been implemented to be used in the following reporting periods for developing and testing the automatic learning algorithms.

Progress beyond the state of the art and expected potential impact (including the socio-economic impact and the wider societal implications of the project so far)

AcListant® (www.aclistant.de), which developed, validated and quantified benefits of Assistant Based Speech Recognition has always exploited speech and radar data from the simulator. MALORCA project is built around the radar and speech data obtained directly from recording from the ops room in Prague and Vienna. AcListant® had also direct access to the speech signal during the simulations and therefore was able to provide speech data that was recorded directly from the microphone. In MALORCA, however, there was no possibility to record speech data in a similar fashion given the strict safety policies of the ops-room. Instead, we agreed to use radio transmission speech data which is recorded for archiving and incidents feedback (not generally intended to be used by another system. Hence, we have a poor quality.
Therefore, several new challenges needs to be tackled which are mainly related to:
- 8 kHz sampling rate, instead 16 kHz
- very noisy speech environment (i.e. low speech to noise ratio)
- high deviations of ATC controllers from standard phraseology
- relatively small amount of in-domain data available (i.e. recordings from controller pilot communication). Currently 40 hours of speech recordings are available. Google’s speech recognizer is based on 200,000 hours of speech samples.
- dealing with some data elements which do deviate from the expectations the grant/proposal was based on
- experts from ATM industry and research as well as from ATM and Speech Recognition come together speaking different domain languages
Record Number: 196504 / Last updated on: 2017-03-29