Periodic Reporting for period 4 - CLASS (Cross-Linguistic Acquisition of Sentence Structure: Integrating Experimental and Computational Approaches)
Periodo di rendicontazione: 2021-03-01 al 2022-02-28
As a curiosity-driven project, the short-term impacts of the project are implications for fundamental science. In the long term, understanding the basic science of language acquisition is crucial for developing interventions for learners with language disorders, and for building useful digital tools; everything from digital assistants such as Apple’s Siri or Amazon’s Alexa to the navigation systems of self-driving cars. Indeed the question of whether and how real-world meaning can be incorporated into such systems – as the models in the present work do – is currently attracting a great deal of attention in computer science (https://dl.acm.org/doi/10.1145/3442188.3445922(si apre in una nuova finestra)) and with the general public (https://www.bbc.co.uk/news/technology-61784011(si apre in una nuova finestra)).
In order to check that our answers to the question of how learners build productive but exception-filled generalizations are not just specific to English (or any other single language), we studied five languages: English, Hindi, Hebrew, Japanese and K'iche'.
WP2 focussed on the question of how children form these generalizations in the first place, focussing on more idiosyncratic language-specific constructions. In Hindi some past-tense sentences require a special type of marker – the “ergative” marker (“ne”) on the subject (e.g. “Raam-ne pulled the rope”). Yet for other, similar sentences, the marker is optional (e.g. “Raam[-ne] smelled the cheese”) or even ungrammatical (e.g. one would say “Raam knew the answer”, not “*Ramm-ne knew the answer). Across a set of judgment and production studies (Maitreyee, Saxena, Narasimhan & Ambridge, in prep) we found evidence that adults and children, that they are learning probabilistically from the input which particular verbs (e.g. pull/smell/know) tend to occur with an without the “ne-“ marker. But they are also sensitive to fine-grained meaning such that the ergative marker is linked with deliberate>accidental actions.
In Japanese, basic two-participant sentences mark the one doing the action with “-ga” and the one having the action done to it with “-o” (e.g. “Dog-ga cat-wo chased”), but children often struggle to do this correctly, particularly when – as is perfectly grammatical in Japanese – the order of the participants is switched around, but the intended meaning is the case (e.g. “Cat-wo dog-ga chased”). In elicited production studies with young children (Ambridge, Saito, Jones, Tatsumi, Fukumura & Kawakami, accepted in principle; in prep) we found that children are sensitive to the frequency with which particular nouns (e.g. dog, cat) appear with and without ga/wo in the input.
In Hebrew (like in English), passive sentence are more acceptable when the first mentioned person or thing is highly affected by the action described. For example, “The boy was pushed by the girl” generally receives the maximum possible acceptability ratings, while “?The boy was seen/liked/understood by the girl” is generally rated as sounding at least somewhat ungrammatical. In a grammaticality judgment study, we found that across verbs (e.g. push/see/like/understand) the greater the extent to which a particular verb is rated as affecting the undergoer (with ratings provided by a separate group of adult participants), the more acceptable the passive form. We are currently finalizing a paper which reports not just the results of this study, but a meta-analytic synthesis that also incorporates similar studies conducted in English, Balinese, Indonesian and Mandarin Chinese (Ambridge, Arnon & Bekman, in prep).
WP3 investigated the question of why languages are the way they are, and whether other types of systems would be difficult or impossible to learn. This work package consisted of five separate studies (Samara, Wonnacott, Saxena & Ambridge, in prep).in which adult or child participants were taught artificial languages with and without the various properties of natural (“real”) languages discussed above. For both children and adults, we found that speakers are capable of learning generalizations with a semantic basis (as in WP1) and with – to some extent – arbitrary exceptions (like in WP2), but only when different forms are competing for the same meaning (e.g. “*Somebody danced the boy” can be out-competed by “Somebody made the boy dance”, but not “The boy danced).
WP4 investigated the question of development: Speakers form these generalizations, make overgeneralization errors (e.g. “*Somebody danced the boy”) and then stop; but how? In WP4 we addressed this question by building computational models of the phenomena studied in WPs1-3. We found that the human data – including verb-by-verb acceptability judgment and production data from both adults and children – are very well simulated (i.e. model-human correlations of around r=0.8) by simple discriminative learning models that map from a combination of a verb+speaker’s intended meaning (e.g. dance+caused-event) to output constructions (e.g. “Somebody VERBed X” vs. “Somebody made X VERB”). These findings suggest that the apparent “paradox” (Pinker, 1989) of how children form productive generalizations with exceptions arises only when we frame the child’s task as one of “hypothesis testing”. Our modelling suggests that productive generalizations with exceptions do not in fact constitute a paradox at all, but instead fall naturally out of a simple psychologically-realistic learning mechanism.