Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

Commonsense and Anticipation enriched Learning of Continuous representations sUpporting Language UnderStanding

Periodic Reporting for period 4 - CALCULUS (Commonsense and Anticipation enriched Learning of Continuous representations sUpporting Language UnderStanding)

Reporting period: 2023-03-01 to 2024-09-30

Natural language understanding (NLU) by the machine is of large scientific, economic and social value. It has applications in human-machine interaction (e.g. voice assistants), in human-human communication (e.g. language translation), and in machine reading and mining of repositories of textual information. Although the field has substantially advanced due to progress in deep neural learning, NLU systems are still struggling to attain human performance. The primary goal of CALCULUS is to advance the state of the art of NLU.
CALCULUS focuses on learning effective representations of events and their narrative structures that are trained on language and visual data. In this process the grammatical structure of language is grounded in the geometric structure of visual data while embodying aspects of commonsense world knowledge. Continuous representations (e.g. in the form of vectors with continuous values) have proved their success in jointly capturing visual and verbal knowledge. An important goal of CALCULUS is to add structure to these representations allowing for compositionality, controllability and explainability.
CALCULUS focuses on continual learning and storing, retrieving and reuse of effective representations. We humans learn from limited data and continually learn almost without forgetting what we have learned before. CALCULUS aims at developing novel algorithms for representing, retrieving and making inferences with prior knowledge, commonsense knowledge or content found earlier in a discourse. The language understanding of CALCULUS connects to the physical and social world. Spatial language is translated to 2D or 3D coordinates in the real world, while temporal expressions are translated to timelines. This process integrates commonsense knowledge which is learned from perceptual input such as visual data. Language is grounded in the social world hereby revealing hidden messages and leading to improved content moderation. The models for language understanding are integrated in a demonstrator that translates language to events happening in a 3-D virtual world.
Achievements include contributions to predictive coding theory in language modeling, exploration of visual-linguistic structures, and advances in neural decoding that link human brain activity with machine-learned representations [e.g. Araujo et al., EMNLP 2021, Sun et al., IJCAI 2023, Sun et al., NeurIPS 2023, Sun et al., AAAI 2025, Cartuyvels et al., AI Open 2020; Allein et al., AI 2024]. Moreover, we have improved language modeling by integrating an unsupervised planning module [Cornille et al., COLM 2024]. Key innovations also include novel memory network designs for efficiently handling long textual contexts [Araujo et al., Findings EMNLP 2023] and improving event recognition through temporal and spatial reasoning [Cartuyvels et al., COLING 2020; Spinks et al., Computers 2020]. The CALCULUS project has proposed new algorithms for continual learning, including a class-balancing reservoir sampling technique [Chrysakis & Moens, ICML 2020]. Another major breakthrough involves integrating visual common sense into neural models for language-guided robotics [Li et al., AAAI 2023]. Spatial representation learning research has introduced novel methods for language-guided visual tasks [Nuyts et al., TACL 2024]. Research on implicit language meaning has led to advancements in social grounding and content moderation [Allein et al., AI 2024], with the latter presented as an oral talk at AAAI 2025.
CALCULUS created a demonstrator converting text into coherent video stories, advancing multimodal representation learning and language-based spatial-temporal reasoning with applications in interactive media and game design. It has driven partnerships in neuroscience and collaborations with the industry through an international symposium and tutorials.
The CALCULUS project has significantly advanced AI by integrating temporal, spatial, and causal reasoning in neural networks and enhancing continual learning. It has also pioneered decoding human brain signals using foundation models, promising breakthroughs in brain-computer interfaces. Beyond language understanding and novel machine learning paradigms, CALCULUS has influenced robotics, text-guided video generation, content moderation, and brain-computer interaction.
Three fundamental research questions have been targeted:
RQ1: What kind of continuous, non-symbolic representations of language and their corresponding training data and learning models are needed to perform the task of “anticipatory” language understanding? How can these anticipatory models effectively learn the structure of language weakly supervised by structural and relational characteristics of perceptual data and as such also embody aspects of common sense and world knowledge?
The project has achieved significant advances in multimodal representation learning, as demonstrated by publications at IJNLP-AACL 2020 [Milewski et al., IJNLP 2020] and ECIR 2021 [Collell et al., ECIR 2021]. Contributions to interpretability and explainability of representations are highlighted in a publication in the Journal of Biomedical Informatics [Spinks et al., 2020]. Progress includes integrating visual-linguistic structures for language-guided spatial reasoning [Nuyts et al., TACL 2024] and achieving breakthroughs in neural decoding, with models translating fMRI brain signals into high-resolution images [Sun et al., NeurIPS 2023], and enabling video decoding [Sun et al., AAAI 2025]. Anticipatory representation learning has redefined predictive text modeling by incorporating predictive coding principles [Araujo et al., EMNLP 2021]. Additionally, novel methods for social grounding in natural language have enhanced content moderation [Allein et al., AI 2024].
RQ2: What algorithms and data structures are needed to efficiently and fast retrieve from memory and combine the anticipatory representations to parse language at the sentence and discourse level and to reduce the number of annotated resources in NLU tasks?
We have designed and evaluated efficient storage and retrieval of enriched representations and their fusion [Cartuyvels et al., COLING 2020]. Inspired by the human memory mechanism, rehearsal and anticipation, we have improved selection and storage of information in a memory network [e.g. Araujo et al., Findings EMNLP 2023]. In [Spinks et al., Computers 2020], matrix representations of objects are learned in a novel way by utilizing distances to contextual reference frames, inspired by human memory.
RQ3: What kind of continuous representations are needed to effectively infer novel content not made explicit in the discourse and to continually learn? Can we make inferences with these representations in an accurate, fast and scalable way in tasks such as spatial and temporal reasoning?
In continual learning, class-balancing reservoir sampling has set a new benchmark for mitigating data imbalance in streaming environments [Chrysakis & Moens, ICML 2020]. Advances include pioneering approaches for parsing temporal event structures through reasoning networks, improving both accuracy and efficiency [Leeuwenberg & Moens, JAIR 2019, Leeuwenberg & Moens, IJCAI 2020, Leeuwenberg & Moens, TASLP 2020]. Finally, CALCULUS proposed novel frameworks for common sense integration in robotics, enabling precise task execution guided by natural language commands [Li et al., AAAI 2023].
Extension of a language model with an anticipatory planner [Cornille et al., COLM 2024].
The CALCULUS demonstrator translates textual stories into video.
The deep learning model that decodes brain fMRI signals into video [Sun et al., AAAI 2025].
My booklet 0 0