Cross-Linguistic Acquisition of Sentence Structure: Integrating Experimental and Computational Approaches

Informazioni relative al progetto

CLASS

ID dell’accordo di sovvenzione: 681296

DOI

10.3030/681296

Progetto chiuso

Data della firma CE 26 Luglio 2016

Data di avvio 1 Settembre 2016

Data di completamento 28 Febbraio 2022

Finanziato da

EXCELLENT SCIENCE - European Research Council (ERC)

Costo totale

€ 1 600 000,00

Contributo UE

€ 1 600 000,00

1 600 000,00

Coordinato da

THE UNIVERSITY OF LIVERPOOL
United Kingdom

Periodic Reporting for period 4 - CLASS (Cross-Linguistic Acquisition of Sentence Structure: Integrating Experimental and Computational Approaches)

Periodo di rendicontazione: 2021-03-01 al 2022-02-28

English speakers can say both “The window broke” and “Somebody broke the window”. Yet they cannot say both “The boy danced” and “*Somebody danced the boy”? (The “*” means that English speakers generally regard the sentence as “ungrammatical”. This isn’t just a quirk of English – many, probably all, languages have these kinds of partial regularities. And children are often misled by them, saying things like “*Somebody danced the boy”. The overall objective of the project was to investigate how, when learning their native language, children avoid these type of errors (or having started to make them, stop).

As a curiosity-driven project, the short-term impacts of the project are implications for fundamental science. In the long term, understanding the basic science of language acquisition is crucial for developing interventions for learners with language disorders, and for building useful digital tools; everything from digital assistants such as Apple’s Siri or Amazon’s Alexa to the navigation systems of self-driving cars. Indeed the question of whether and how real-world meaning can be incorporated into such systems – as the models in the present work do – is currently attracting a great deal of attention in computer science (https://dl.acm.org/doi/10.1145/3442188.3445922) and with the general public (https://www.bbc.co.uk/news/technology-61784011).

In order to check that our answers to the question of how learners build productive but exception-filled generalizations are not just specific to English (or any other single language), we studied five languages: English, Hindi, Hebrew, Japanese and K'iche'.

WP1 investigated whether children are helped by some cognitive-semantic universals. We tested the idea that sentences of the form “Somebody ACTIONed the THING” are more ungrammatical when (amongst other factors) the “THING” has some say over what’s going on: For example, “*Someone danced the boy” is worse than “Somebody broke the window” because the boy could in principle resist being “danced”, but a window can’t resist being broken. The studies from this project (Ambridge et al, 2020; 2022) provided support for this possibility – across all five languages – using grammatical acceptability judgment rating studies and elicited production.

WP2 focussed on the question of how children form these generalizations in the first place, focussing on more idiosyncratic language-specific constructions. In Hindi some past-tense sentences require a special type of marker – the “ergative” marker (“ne”) on the subject (e.g. “Raam-ne pulled the rope”). Yet for other, similar sentences, the marker is optional (e.g. “Raam[-ne] smelled the cheese”) or even ungrammatical (e.g. one would say “Raam knew the answer”, not “*Ramm-ne knew the answer). Across a set of judgment and production studies (Maitreyee, Saxena, Narasimhan & Ambridge, in prep) we found evidence that adults and children, that they are learning probabilistically from the input which particular verbs (e.g. pull/smell/know) tend to occur with an without the “ne-“ marker. But they are also sensitive to fine-grained meaning such that the ergative marker is linked with deliberate>accidental actions.

In Japanese, basic two-participant sentences mark the one doing the action with “-ga” and the one having the action done to it with “-o” (e.g. “Dog-ga cat-wo chased”), but children often struggle to do this correctly, particularly when – as is perfectly grammatical in Japanese – the order of the participants is switched around, but the intended meaning is the case (e.g. “Cat-wo dog-ga chased”). In elicited production studies with young children (Ambridge, Saito, Jones, Tatsumi, Fukumura & Kawakami, accepted in principle; in prep) we found that children are sensitive to the frequency with which particular nouns (e.g. dog, cat) appear with and without ga/wo in the input.

In Hebrew (like in English), passive sentence are more acceptable when the first mentioned person or thing is highly affected by the action described. For example, “The boy was pushed by the girl” generally receives the maximum possible acceptability ratings, while “?The boy was seen/liked/understood by the girl” is generally rated as sounding at least somewhat ungrammatical. In a grammaticality judgment study, we found that across verbs (e.g. push/see/like/understand) the greater the extent to which a particular verb is rated as affecting the undergoer (with ratings provided by a separate group of adult participants), the more acceptable the passive form. We are currently finalizing a paper which reports not just the results of this study, but a meta-analytic synthesis that also incorporates similar studies conducted in English, Balinese, Indonesian and Mandarin Chinese (Ambridge, Arnon & Bekman, in prep).

WP3 investigated the question of why languages are the way they are, and whether other types of systems would be difficult or impossible to learn. This work package consisted of five separate studies (Samara, Wonnacott, Saxena & Ambridge, in prep).in which adult or child participants were taught artificial languages with and without the various properties of natural (“real”) languages discussed above. For both children and adults, we found that speakers are capable of learning generalizations with a semantic basis (as in WP1) and with – to some extent – arbitrary exceptions (like in WP2), but only when different forms are competing for the same meaning (e.g. “*Somebody danced the boy” can be out-competed by “Somebody made the boy dance”, but not “The boy danced).

WP4 investigated the question of development: Speakers form these generalizations, make overgeneralization errors (e.g. “*Somebody danced the boy”) and then stop; but how? In WP4 we addressed this question by building computational models of the phenomena studied in WPs1-3. We found that the human data – including verb-by-verb acceptability judgment and production data from both adults and children – are very well simulated (i.e. model-human correlations of around r=0.8) by simple discriminative learning models that map from a combination of a verb+speaker’s intended meaning (e.g. dance+caused-event) to output constructions (e.g. “Somebody VERBed X” vs. “Somebody made X VERB”). These findings suggest that the apparent “paradox” (Pinker, 1989) of how children form productive generalizations with exceptions arises only when we frame the child’s task as one of “hypothesis testing”. Our modelling suggests that productive generalizations with exceptions do not in fact constitute a paradox at all, but instead fall naturally out of a simple psychologically-realistic learning mechanism.

Before the project, the current state of the art was studies conducted solely on English (which we summarized in a 2018 meta-analysis paper). We have progress beyond the state of the art by testing both old theories and the new theory developed as part of the current work on four more languages (Hindi, Hebrew, K'iche' and Japanese), and artificial lab-created languages. This latter development also represents progress beyond the state of the art, as we have been able - for the first time - to create a novel language with semantic distinctions that are readily learnable by children.

Project Logo

Periodic Reporting for period 4 - CLASS (Cross-Linguistic Acquisition of Sentence Structure: Integrating Experimental and Computational Approaches)

Scarica Scarica il contenuto della pagina