Periodic Reporting for period 4 - WIDE (Wide Incremental learning with Discrimination nEtworks)
Okres sprawozdawczy: 2022-03-01 do 2022-08-31
One important challenge for the WIDE project has been to clarify whether the simple and mathematically transparent linear networks of the DLM are up to the task of predicting meanings from forms, and forms from meanings. Application of the DLM to a series of typologically unrelated languages, including Estonian, Finnish, English, German, Russian, Navajo, Pame, Korean, Kinyarwanda, and Mandarin Chinese, revealed that the model performs very well on training data, and generalizes well to unseen data provided that (1) the morphology of the language is relatively regular and (2) that sufficient training data are available. As expected, generalization degrades for irregular morphology.
These results lead to the novel insight that the high-dimensional spaces of form and meaning are aligned to a much greater extent than previously thought. Thanks to the isomorphy between these spaces, the simple linear mappings of the DLM are able to generate accurate predictions not only for words forms and meanings, but also for unprimed and primed reaction times and spoken word durations.
Although the DLM does not work with form units for morphemes, it does implement semantic operations for inflectional and derivational morphology, using vector addition in semantic space. The DLM implements the conceptualization of, for instance, a Finnish partitive plural noun by adding the semantic vectors of the lexeme, the noun plural, and the partitive case. To this sum, an interaction term for number by case has to be added, given that an analysis of empirical Finnish word embeddings indicates that in Finnish, the meaning of the plural varies systematically with case. Figure 1 illustrates that in English, the meaning of noun plurals is also conditional, but instead of depending on case, plural meaning arises with the semantic class of the noun: each dot in this figure represents the change from singular to plural in a high-dimensional semantic space that was projected onto two two dimensions using t-distributed stochastic neighbor embedding. Selected semantic classes are highlighted using color coding. Importantly, when modeling with empirical embeddings for inflected words, the embeddings of the constituent semantic primitives and interactions are imputed, so that the model is able to conceptualize and produce novel inflected words.
The insights obtained with the WIDE project will be applied in a new project funded by the ERC, SUBLIMINAL. This project aims to improve smartphone apps for learning Mandarin as a second language, using the DLM model (in trial-to-trial learning mode) in combination with enhanced feedback on the phonetics and semantics of Mandarin words.
A complementary line of research addressed the modeling of speech production. A model for articulation was developed that calculates the time series of articulatory parameters that jointly control a physical model of the vocal tract. This model starts from a high-dimensional semantic representation, and in resonance with acoustic information learned previously from words' audio files, optimizes the time series of articulatory control parameters. In order to effectively capture temporal correlations, the PAULE model makes use of relatively shallow deep learning.
An important line of research pitted the DLM against empirical data. The model was found to work very well for typologically different languages. Production tended to lag somewhat behind comprehension, reflecting that humans tend to understand more words than they themselves actually use. A study reporting the successful application of the DLM model to the complex morphology of Maltese nouns has been accepted for linguistics' flagship journal Language.
A limitation of the DLM model is that it accounts only for words in isolation, without taking any context into account. However, how words are spoken, and what they mean, depends on the context in which they are used. For instance, English cut can denote actions carried out with chainsaws, knives, or scissors, actions for which Dutch and Mandarin use three different verbs. English cut displays a wide range of other meanings, across derivations (cutter, a type of ship), compounds (cutworm, a moth larva), lexicalised expressions (cut across), and idioms (to cut classes). Lexical knowledge does not consist of just simple and complex words, but of tens of thousands of multi-word expressions. Furthermore, what in one language is expressed with a single richly inflected morphologically complex word, may require a phrase in other languages. These considerations have led to an extension of the DLM in which the meanings of simple sentences are represented as points in semantic space. Algorithms were developed for the conceptualisation of syntactic roles such as agent and patient, and pragmatic functions such as honorifics (as found in Korean and Japanese). They were set up in such a way that entities with different syntactic or pragmatic functions are properly distinguished, while maintaining lexical similarities.