Skip to main content
European Commission logo
English English
CORDIS - EU research results
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary

Tree rewriting grammars and the syntax-semantics interface: From grammar development to semantic parsing

Periodic Reporting for period 4 - TreeGraSP (Tree rewriting grammars and the syntax-semantics interface:From grammar development to semantic parsing)

Reporting period: 2022-01-01 to 2023-06-30

The increasing amount of data available in our digital society is both a chance and a challenge for natural language processing. On the one hand, we have better possibilities than ever to extract and process meaning from language data, and recent techniques, in particular deep learning methods, have achieved impressive results. On the other hand, linguistic research has a much broader empirical basis and can aim at rich quantitative models of language. Unfortunately, theory and application interact too little in these areas of meaning extraction and grammar theory. Current semantic processing techniques do not sufficiently capture the complex structure of language while grammatical theory does not sufficiently incorporate data-driven insights about language. TreeGraSP bridges this gap by combining rich linguistic theory with data-driven approaches to large scale statistical grammar induction and to semantic parsing. The novelty of its approach consists in putting semantics at the center of grammar theory, putting an emphasis on multilinguality and typological diversity, and adopting a constructional approach to grammar. TreeGraSP is interdisciplinary and innovative in several respects: It contributes to the field of linguistics by a) making theories of grammar explicit, b) providing a grammar implementation tool for typologically working linguists and c) developing means to obtain a quantitative grammar theory. And it contributes to the field of computational semantics by providing a probabilistic theory of meaning construal that can be used for textual entailment and reasoning applications. The challenge lies in the intended transfer between theoretical linguistics and statistical natural language processing.
TreeGraSP has clarified the principles and structures of the linguistic theory of Role and Reference Grammar (RRG), a theory that is underlying a large amount of typological, cross-linguistic research. The result is a formalization and the development of tools that allow for precise grammar implementation and testing and that can be used for data-driven syntactic and semantic parsing. Based on this formalization, TreeGraSP has modeled a range of typologically challenging and interesting phenomena, including an implementation of parts of it as a proof of concept. As a basis for RRG-based corpus linguistic research and for parsing, TreeGraSP has created RRG treebanks for several languages. Probabilistic syntactic parsing techniques for related frameworks have been developed in the first half of the project, in view of using them for RRG. These parsing techniques have recently been applied to the treebanks created in the project, which has led to the development of a multilingual RRG parser. Furthermore, parsing has been extended to cross-lingual approaches that are specifically tailored to low resource scenarios. Moreover, new strategies for data-driven parsing of text into explicit, transparent semantic representations have been developed, that can be linked to frame semantics and to RRG's logical representations.
TreeGraSP has made considerable progress in formalizing approaches to grammar theory that originated from a typologically large selection of languages, thereby contributing to better understanding these theories and enabling their computational processing. The project has put some effort into bridging the gap between computational linguistics and typological research, which has resulted in fruitful collaborations. TreeGraSP has made available a multilingual RRG parser and the project has developed strategies of transfer from universal dependencies (UD) and of cross-lingual parsing that can be employed with new languages, in particular with data one typically finds in typological fieldwork contexts. Recently, the project has particularly addressed issues of low-resource treebanking, in collaboration with typological linguists. The goal to provide resources and tools for automatic grammar development and language processing to linguists working with Role and Reference Grammar (RRG) has been reached the resulting approaches and tools will be further used in the future.

Concerning semantic parsing, one of the aims of the project was to link grammar formalisms that assume an extended domain of locality to semantic parsing towards explicit semantic representations that capture event types, semantic roles, and also logical relations between meaning components. On the syntactic side, TreeGraSP has provided a constituency parser that a) identifies elementary building blocks that represent in some sense constructions and b) yields competitive results, compared to other standard parsers. The above-mentioned treebanks were then extended with semantic annotation, concretely semantic roles and event types. This resource was then used to develop frame-based semantic parsers. Concretely, the approaches to syntactic parsing have been extended to semantics, yielding a system that shows a state of the art performance while providing transparent interpretable representations of syntax and semantics.
The syntactic and semantic annotation interface provided by RRGparbank