Skip to main content

Tree rewriting grammars and the syntax-semantics interface:From grammar development to semantic parsing

Periodic Reporting for period 2 - TreeGraSP (Tree rewriting grammars and the syntax-semantics interface:From grammar development to semantic parsing)

Reporting period: 2019-01-01 to 2020-06-30

The increasing amount of data available in our digital society is both a chance and a challenge for natural language processing. On the one hand, we have better possibilities than ever to extract and process meaning from language data, and recent techniques, in particular deep learning methods, have achieved impressive results. On the other hand, linguistic research has a much broader empirical basis and can aim at rich quantitative models of language. Unfortunately, theory and application interact too little in these areas of meaning extraction and grammar theory. Current semantic processing techniques do not sufficiently capture the complex structure of language while grammatical theory does not sufficiently incorporate data-driven insights about language. TreeGraSP bridges this gap by combining rich linguistic theory with data-driven approaches to large scale statistical grammar induction and to semantic parsing. The novelty of its approach consists in putting semantics at the center of grammar theory, putting an emphasis on multilinguality and typological diversity, and adopting a constructional approach to grammar. TreeGraSP is interdisciplinary and innovative in several respects: It contributes to the field of linguistics by a) making theories of grammar explicit, b) providing a grammar implementation tool for typologically working linguists and c) developing means to obtain a quantitative grammar theory. And it contributes to the field of computational semantics by providing a probabilistic theory of meaning construal that can be used for textual entailment and reasoning applications. The challenge lies in the intended transfer between theoretical linguistics and statistical natural language processing.
TreeGraSP has clarified the principles and structures of the linguistic theory of Role and Reference Grammar (RRG), a theory that is underlying a large amount of typological, cross-linguistic research. The result is a formalization and the development of tools that allow for precise grammar implementation and testing and that can be used for data-driven syntactic and semantic parsing. Probabilistic syntactic parsing techniques for related frameworks have been developed in the first half of the project, in view of using them for RRG. Furthermore, new strategies for data-driven parsing of text into explicit, transparent semantic representations have been developed, that can be linked to frame semantics and to RRG's logical representations.
TreeGraSP has made considerable progress in formalizing approaches to grammar theory that originated from a typologically large selection of languages, thereby contributing to better understanding these theories and enabling their computational processing. The project has put some effort into bridging the gap between computational linguistics and typological research, which has resulted in fruitful collaborations. The goal to provide resources and tools for automatic grammar development and language processing to linguists working with Role and Reference Grammar (RRG) has partly been reached and is expected to be fulfilled in the course of the second half of the project.

Concerning semantic parsing, one of the aims of the project was to link grammar formalisms that assume an extended domain of locality to semantic parsing towards explicit semantic representations that capture event types, semantic roles, and also logical relations between meaning components. On the syntactic side, TreeGraSP has provided a constituency parser that a) identifies elementary building blocks that represent in some sense constructions and b) yields competitive results, compared to other standard parsers. We will extend this to semantic parsing in the second half of the project and we expect that the additional knowledge about predicate argument dependencies obtained from syntactic parsing will be beneficial for semantic parsing, in particular for event type identification and semantic role labeling.