Periodic Reporting for period 4 - CALCULUS (Commonsense and Anticipation enriched Learning of Continuous representations sUpporting Language UnderStanding)
Reporting period: 2023-03-01 to 2024-09-30
CALCULUS focuses on learning effective representations of events and their narrative structures that are trained on language and visual data. In this process the grammatical structure of language is grounded in the geometric structure of visual data while embodying aspects of commonsense world knowledge. Continuous representations (e.g. in the form of vectors with continuous values) have proved their success in jointly capturing visual and verbal knowledge. An important goal of CALCULUS is to add structure to these representations allowing for compositionality, controllability and explainability.
CALCULUS focuses on continual learning and storing, retrieving and reuse of effective representations. We humans learn from limited data and continually learn almost without forgetting what we have learned before. CALCULUS aims at developing novel algorithms for representing, retrieving and making inferences with prior knowledge, commonsense knowledge or content found earlier in a discourse. The language understanding of CALCULUS connects to the physical and social world. Spatial language is translated to 2D or 3D coordinates in the real world, while temporal expressions are translated to timelines. This process integrates commonsense knowledge which is learned from perceptual input such as visual data. Language is grounded in the social world hereby revealing hidden messages and leading to improved content moderation. The models for language understanding are integrated in a demonstrator that translates language to events happening in a 3-D virtual world.
CALCULUS created a demonstrator converting text into coherent video stories, advancing multimodal representation learning and language-based spatial-temporal reasoning with applications in interactive media and game design. It has driven partnerships in neuroscience and collaborations with the industry through an international symposium and tutorials.
The CALCULUS project has significantly advanced AI by integrating temporal, spatial, and causal reasoning in neural networks and enhancing continual learning. It has also pioneered decoding human brain signals using foundation models, promising breakthroughs in brain-computer interfaces. Beyond language understanding and novel machine learning paradigms, CALCULUS has influenced robotics, text-guided video generation, content moderation, and brain-computer interaction.
RQ1: What kind of continuous, non-symbolic representations of language and their corresponding training data and learning models are needed to perform the task of “anticipatory” language understanding? How can these anticipatory models effectively learn the structure of language weakly supervised by structural and relational characteristics of perceptual data and as such also embody aspects of common sense and world knowledge?
The project has achieved significant advances in multimodal representation learning, as demonstrated by publications at IJNLP-AACL 2020 [Milewski et al., IJNLP 2020] and ECIR 2021 [Collell et al., ECIR 2021]. Contributions to interpretability and explainability of representations are highlighted in a publication in the Journal of Biomedical Informatics [Spinks et al., 2020]. Progress includes integrating visual-linguistic structures for language-guided spatial reasoning [Nuyts et al., TACL 2024] and achieving breakthroughs in neural decoding, with models translating fMRI brain signals into high-resolution images [Sun et al., NeurIPS 2023], and enabling video decoding [Sun et al., AAAI 2025]. Anticipatory representation learning has redefined predictive text modeling by incorporating predictive coding principles [Araujo et al., EMNLP 2021]. Additionally, novel methods for social grounding in natural language have enhanced content moderation [Allein et al., AI 2024].
RQ2: What algorithms and data structures are needed to efficiently and fast retrieve from memory and combine the anticipatory representations to parse language at the sentence and discourse level and to reduce the number of annotated resources in NLU tasks?
We have designed and evaluated efficient storage and retrieval of enriched representations and their fusion [Cartuyvels et al., COLING 2020]. Inspired by the human memory mechanism, rehearsal and anticipation, we have improved selection and storage of information in a memory network [e.g. Araujo et al., Findings EMNLP 2023]. In [Spinks et al., Computers 2020], matrix representations of objects are learned in a novel way by utilizing distances to contextual reference frames, inspired by human memory.
RQ3: What kind of continuous representations are needed to effectively infer novel content not made explicit in the discourse and to continually learn? Can we make inferences with these representations in an accurate, fast and scalable way in tasks such as spatial and temporal reasoning?
In continual learning, class-balancing reservoir sampling has set a new benchmark for mitigating data imbalance in streaming environments [Chrysakis & Moens, ICML 2020]. Advances include pioneering approaches for parsing temporal event structures through reasoning networks, improving both accuracy and efficiency [Leeuwenberg & Moens, JAIR 2019, Leeuwenberg & Moens, IJCAI 2020, Leeuwenberg & Moens, TASLP 2020]. Finally, CALCULUS proposed novel frameworks for common sense integration in robotics, enabling precise task execution guided by natural language commands [Li et al., AAAI 2023].