Final Report Summary - DISCOTEX (Distributional Compositional Semantics for Text Processing)
DisCoTex is focused on a sub-branch of AI called Natural Language Processing (NLP, also known as Computational Linguistics), which is concerned with the automatic processing, analysis, and generation of natural language. The need to develop effective NLP technology is becoming more pressing as humans produce more information in the form of electronic text. Automatic tools are required for all of these applications to process and extract information from the textual data. In order to develop intelligent tools, the computer needs to acquire knowledge of the meanings of words and, crucially, requires a mechanism for combining the word meanings into meanings of whole phrases and sentences.
The achievements of the project are as follows:
o we have developed methods for inducing distributed word meanings from text, using techniques from neural-based machine learning (sometimes known as "Deep Learning");
o we have developed a whole theory of how to combine such distributed word meanings to obtain meanings of phrases and sentences;
o we have applied the theory to the construction of a natural language parser, which is a tool for automatically inferring the grammatical relations between words in a sentence.
We expect that the combination of machine learning and linguistics, which is a hallmark of the DisCoTex project, will be used in the future to develop the next generation of intelligent text processing software.