Final Report Summary - GRAMPLUS (Grammar-Based Robust Natural Language Processing)
CCG links syntax and semantics very closely at every stage of derivation. It has been widely adopted for computational applications, including robust wide coverage parsing using statistical models derived by machine learning from datasets such as the Penn Treebank, particularly in tasks where semantic interpretation is required, such as question-answering or semantics-based parser induction.
Like all contemporary parsers, those based on CCG are limited by the “labeled data bottleneck”---current resources like the Penn Treebank are too small to provide really reliable parsers. The GramPlus project proposes a number of extensions to CCG itself and to the related computational applications, including an extended robust semantics covering both logical operators like negation and distributional relations of paraphrase and entailment between content words and expressions, semi-supervised methods for generalizing Treebank parsers using large amounts of unlabeled text to augment supervised methods using machine learning, methods for inducing grammars and parsers for many languages from paired sentences and meaning representations, among others. The results of the project include: successful parser generalization using a number of semi-supervised methods training on unlabeled text; new parsing techniques including semi-supervised supertaggers and incremental algorithms with state-of-the-art speed and accuracy; improved parsers for under-resourced languages including Hindi; combined logical and distributional semantics with state-of-the-art performance in application to question answering; new techniques for automatic semantic parser induction from sentences paired with database queries, which have been successfully applied in a psychologically and linguistically plausible model of child language learning on the basis of exposure to meaning-revealing context; a semantics for the discourse information implicit in English intonation; and a demonstration that musical harmony can be analysed using the same kind of CCG grammar, with the same parsing algorithm, and statistical model. These methods and results are for the most part independent of the specific grammatical approach used in the project, and are of general interest to a range of linguists, computational linguists, psychologists, and other cognitive scientists, as well as those interested in robust practical applications of Natural Language Processing