Wide Incremental learning with Discrimination nEtworks

Periodic Reporting for period 4 - WIDE (Wide Incremental learning with Discrimination nEtworks)

Okres sprawozdawczy: 2022-03-01 do 2022-08-31

The WIDE project took on the challenge to develop a computational model of how language users understand and produce words. In psycholinguistics, the knowledge and skills that underly the comprehension and production of words is referred to as the ``mental lexicon''. This term reflects that traditionally, a speaker's word knowledge is conceptualized along the lines of a dictionary that lists words together with their meanings. However, earlier modeling work in psychology that made use of multi-layer artificial neural networks (the precursors of current deep learning models), argued that words are `calculated on the fly' rather than retrieved. The model developed in the WIDE project, the Discriminative Lexicon Model (DLM) also does not retrieve words from a mental dictionary, but calculates words' meanings from their forms, and calculates words' forms from their meanings. The DLM differs from the earlier models from psychology in that the architecture of its networks is substantially simplified. Words' forms and meanings are represented by high-dimensional numeric real-valued vectors, which are connected without any hidden layers. The resulting `wide' networks are mathematically equivalent to multivariate multiple regression models in statistics.

One important challenge for the WIDE project has been to clarify whether the simple and mathematically transparent linear networks of the DLM are up to the task of predicting meanings from forms, and forms from meanings. Application of the DLM to a series of typologically unrelated languages, including Estonian, Finnish, English, German, Russian, Navajo, Pame, Korean, Kinyarwanda, and Mandarin Chinese, revealed that the model performs very well on training data, and generalizes well to unseen data provided that (1) the morphology of the language is relatively regular and (2) that sufficient training data are available. As expected, generalization degrades for irregular morphology.

These results lead to the novel insight that the high-dimensional spaces of form and meaning are aligned to a much greater extent than previously thought. Thanks to the isomorphy between these spaces, the simple linear mappings of the DLM are able to generate accurate predictions not only for words forms and meanings, but also for unprimed and primed reaction times and spoken word durations.

Although the DLM does not work with form units for morphemes, it does implement semantic operations for inflectional and derivational morphology, using vector addition in semantic space. The DLM implements the conceptualization of, for instance, a Finnish partitive plural noun by adding the semantic vectors of the lexeme, the noun plural, and the partitive case. To this sum, an interaction term for number by case has to be added, given that an analysis of empirical Finnish word embeddings indicates that in Finnish, the meaning of the plural varies systematically with case. Figure 1 illustrates that in English, the meaning of noun plurals is also conditional, but instead of depending on case, plural meaning arises with the semantic class of the noun: each dot in this figure represents the change from singular to plural in a high-dimensional semantic space that was projected onto two two dimensions using t-distributed stochastic neighbor embedding. Selected semantic classes are highlighted using color coding. Importantly, when modeling with empirical embeddings for inflected words, the embeddings of the constituent semantic primitives and interactions are imputed, so that the model is able to conceptualize and produce novel inflected words.

The insights obtained with the WIDE project will be applied in a new project funded by the ERC, SUBLIMINAL. This project aims to improve smartphone apps for learning Mandarin as a second language, using the DLM model (in trial-to-trial learning mode) in combination with enhanced feedback on the phonetics and semantics of Mandarin words.

One line of research focused on auditory word recognition. New auditory form vectors, calculated from the speech signal, were developed and shown to outperform previous models when trained on realistic, highly variable speech input. Furthermore, a series of learning experiments addressed human incremental learning of phonetic cues.

A complementary line of research addressed the modeling of speech production. A model for articulation was developed that calculates the time series of articulatory parameters that jointly control a physical model of the vocal tract. This model starts from a high-dimensional semantic representation, and in resonance with acoustic information learned previously from words' audio files, optimizes the time series of articulatory control parameters. In order to effectively capture temporal correlations, the PAULE model makes use of relatively shallow deep learning.

An important line of research pitted the DLM against empirical data. The model was found to work very well for typologically different languages. Production tended to lag somewhat behind comprehension, reflecting that humans tend to understand more words than they themselves actually use. A study reporting the successful application of the DLM model to the complex morphology of Maltese nouns has been accepted for linguistics' flagship journal Language.

A limitation of the DLM model is that it accounts only for words in isolation, without taking any context into account. However, how words are spoken, and what they mean, depends on the context in which they are used. For instance, English cut can denote actions carried out with chainsaws, knives, or scissors, actions for which Dutch and Mandarin use three different verbs. English cut displays a wide range of other meanings, across derivations (cutter, a type of ship), compounds (cutworm, a moth larva), lexicalised expressions (cut across), and idioms (to cut classes). Lexical knowledge does not consist of just simple and complex words, but of tens of thousands of multi-word expressions. Furthermore, what in one language is expressed with a single richly inflected morphologically complex word, may require a phrase in other languages. These considerations have led to an extension of the DLM in which the meanings of simple sentences are represented as points in semantic space. Algorithms were developed for the conceptualisation of syntactic roles such as agent and patient, and pragmatic functions such as honorifics (as found in Korean and Japanese). They were set up in such a way that entities with different syntactic or pragmatic functions are properly distinguished, while maintaining lexical similarities.

Linguistics has a long history of viewing form and meaning as independent and encapsulated domains. Although explanations for variation in words' forms have been formulated almost exclusively in terms of form properties, we have been able to show that the amount of support that according to our model words' forms receive from their distributional semantics is a strong co-determinant of their phonetic realization. Our findings motivate the PAULE model, a new computational model for articulation in speech production which predicts the movements of the articulators over time from high-dimensional semantic vectors (embeddings). PAULE is the first computational model in phonetics that is completely data-driven and does not require hand-crafted symbolic units such as morphemes and syllables.

English plural meanings cluster by semantic class. Selected semantic classes are highlighted.

Periodic Reporting for period 4 - WIDE (Wide Incremental learning with Discrimination nEtworks)

Udostępnij tę stronę

Pobierz