How machines interpret human language
Have you ever tried to become fluent in a foreign language by studying lists of words and grammatical rules? If so, you probably didn’t get very far. Gaining experience from listening, reading and speaking plays a key role in the human learning process. This is no different when it comes to developing human language technologies (HLT), such as speech recognition, machine translation and text image recognition. Progress in these fields has been driven by huge advances in AI, as neural networks are trained with real-world data to recognise and translate language more accurately. To help them reach their full potential, the SEQCLAS project looked at what these three technologies have in common. Funded by the European Research Council (ERC), the project considered them from the perspective of a holistic framework based on statistical decision theory. Its contribution could enable teams working on HLT to critically assess and improve algorithms.
An ‘I’ for an ‘eye’
“In all three application areas, it is the context that enables the system to achieve better interpretations of the input and produce a more accurate output,” explains Hermann Ney, head of the Human Language Technology and Pattern Recognition Group at RWTH Aachen University in Germany and SEQCLAS principle investigator. When it comes to language, however, this context is complex and multilayered. How can the machine learn to tell ‘holy’ from ‘wholly’ to transcribe it correctly, or distinguish the verb ‘duck’ from the noun, to pick the right translation? “If we take the example of speech recognition, each sound needs to be considered as part of a sequence – a word, a sentence, even a dialogue – to correctly interpret its meaning,” Ney says. Processing and classifying such sequences means enabling neural networks to recognise the patterns structuring them. The SEQCLAS team looked at this challenge from the point of view of decision theory. This framework emphasises the importance of the performance criterion (e.g. the number of errors) for these sequence-to-sequence processing tasks. “As a consequence, the performance criterion can be used to improve the structure and the training of the neural network-based systems,” Ney notes. When it comes to machine translation, this performance can be harder to quantify, he concedes, due to the existence of different possible interpretations and translations.
Insight for progress
To complement their conceptual work, the researchers worked on a number of models and tests translating it into practical improvements. They used unsupervised and semi-supervised learning techniques to enable machine translation using monolingual data in source and target languages. This work could for instance contribute to making better machine translations available for less common language pairs. Ney and his colleagues also delivered several prototype systems which will serve as a basis for further research in this promising field. He believes the project’s holistic approach also offers a unique historical perspective on the concepts underpinning HLT. “We tend to forget that neural networks have been used for speech recognition for over 30 years. Their development has long been held back by a lack of computing power,” Ney adds. “Re-evaluating and updating existing research in the light of today’s capabilities and insights could help us achieve further advances in this field.”
Keywords
SEQCLAS, human language technology, speech recognition, text image recognition, machine translation, neural network, algorithm, unsupervised learning, semi-supervised learning