European Commission logo
italiano italiano
CORDIS - Risultati della ricerca dell’UE
CORDIS

Modeling the acquisition of phonotactics

Final Report Summary - MODACQUPHON (Modeling the acquisition of phonotactics)

Nine-month-old infants react differently to licit versus illicit sound combinations (Jusczyk et al. 1993), thus displaying knowledge of the target adult phonotactics already at a very early developmental stage. How can this early stage of the acquisition of phonotactics be modeled? Assume that the learner entertains a current hypothesis of the target adult grammar; that it is trained on a string of phonotactically licit forms; and that the current grammar is slightly updated whenever it makes a mistake on the current piece of data. This model is known as error-driven learning (EDL). Does EDL provide a proper model of the child’s early acquisition of phonotactics? My MC project has addressed this question within two constraint-based phonological frameworks: Optimality Theory (OT; Prince and Smolensky 2004) and Harmonic Grammar (HG; Legendre et al. 1998b,a). This report summarizes the main research results obtained during my MC.

=======================================================
First set of results: concerning consistency and convergence
=======================================================

Is it possible to ensure that the phonotactics learned by the EDL is consistent with the training data? namely that it never fails at classifying the licit training data as indeed phonotactically licit? Consistency is equivalent to convergence, namely the requirement that the EDL only makes a finite number of errors on the licit data it is trained on. During my MC, I have completed my research thread on consistency/convergence in OT and HG, that I had started in 2009. These results have been consolidated into three journal papers accepted at Phonology, the Journal of Language Modeling, and the Journal of Logic and Computation. These results have also been presented at OCP 11 (the 11th Old World Conference in Phonology) in Leiden and Amsterdam in January 2014. Some of these results have been presented in the class ''Phonology: Typology and Acquisition'' co-taught with Aoju Chen and Rene Kager at Utrecht University (LIMV13010) in spring 2014.

=======================================================
Second set of results: concerning restrictiveness on F-irrelevant languages
=======================================================

A consistent EDL could still incorrectly declare as licit too many forms, namely it could fail at recognizing as illicit those forms which are indeed illicit according to the target phonotactics. This is the issue of restrictiveness. This turns out to be a more challenging issue than consistency. It is thus convenient to start from the following question: is it possible to isolate a large class of cases that are simple, namely allow for strong guarantees on restrictiveness? In OT, constraints come in two varieties: faithfulness and markedness constraints. Intuitively, the relative ranking of the faithfulness constraints is crucial in order to determine the specific way in which illicit forms are repaired. But only rarely does it contribute to the distinction between licit and illicit forms. Namely, it is irrelevant for phonotactics in the vast majority of cases. Thus, let me say that a certain phonotactics is F-irrelevant provided the relative ranking of the faithfulness constraints does not matter. During my MC, I have obtained the first guarantees of restrictiveness on F-irrelevant languages. These results have been consolidated into three papers (a manuscript, a paper in the proceedings of MORPHFSM 2014; and a paper in the Supplemental Proceedings of Phonology 2013). Some of these results have been presented at MFM 22 (the 22nd Manchester Phonology Meeting) in May 2014.

=======================================================
Third set of results: concerning restrictiveness on F-relevant languages
=======================================================

What about the remaining F-relevant cases, where the relative ranking of the faithfulness constraints does matter for the distinction between licit and illicit forms? In earlier work of mine (Magri 2013), I have shown that the problem of the acquisition of phonotactics in OT is intractable without proper assumptions on the underlying OT constraint set. This complexity result says that no algorithm can ever succeed on an arbitrary F-relevant language, unless we make assumptions on the constraint set. This result motivates the following question: is it the case that phonologically plausible restrictions on OT constraint sets suffice to ensure EDL’s restrictiveness on F-relevant languages? If this conjecture turns out to be correct, it will provide formidable support for the hypothesis that OT EDLs provide a proper model of the child’s acquisition of phonotactics. My research on these issues during my MC has led to four papers (a manuscript, a paper in the volume ''Short’ schrift for Alan Prince''; one paper in the proceedings of MOL 14; and one paper in the proceedings of NELS 45; the last two papers are joint work with Rene Kager).

I am currently extending these results along the following line. OT models segment inventories through rankings of feature co-occurrence constraints (FCCs), which penalize certain combina- tions of feature values. Towards the goal of developing a comprehensive formal theory of FCCs, I am currently exploring the Tree Hypothesis (TH). It maintains that the FCCs define universal entailments among the features which are representable though a feature interaction graph which crucially has no loops, namely it is a tree. I want to explore the TH from the perspectives of learnability and typology. My first goal is to show that the TH provides enough structure to derive restrictiveness guarantees for a properly designed EDL within OT—while OT phonotactics is provably unlearnable without a substantive markedness theory, as recalled above. My second goal is to develop a system of FCCs which complies with the TH and has good typological coverage relative to various available databases of segment inventories. A substantial part of my MC has been spent in gaining the phonological expertise required by the latter typological analysis, under Kager’s supervision.

=======================================================
Fourth set of results: idempotency and the lack of underlying forms
=======================================================

Alternations map target sounds (underlying forms) to the corresponding pronunciations (surface forms). For instance, English maps /dog+s/ unfaithfully to [dogz] because [gs] is illicit in English. As recalled above, nine-month-olds already react differently to legal and illegal sounds. They thus display knowledge of the native phonotactics at a developmental stage when morphological decomposition (say, of a noun and a plural affix) is plausibly still beyond reach (Hayes 2004). How can phonotactics be acquired without the information on underlying forms provided by morphological alternations? A common assumption in the literature is that a phonotactic learner contends with the lack of information on underlying forms by systematically positing a faithful underlying form (FUF) for each licit training surface form. Is this strategy computation- ally sound? A sufficient condition for the soundness of FUFs is that the target typology consists of grammars which are all idempotent, namely map any phonotactically licit form faithfully to itself. The research I have done during the second year of my MC has thus focused on the issue of idempotency, leading to four papers (one in the proceedings of WCCFL 33; the other three under submission/revision at the Journal of Logic, Language, and Information, the Journal of Linguistics, and Linguistic Inquiry). These results have also been presented at: OCP 12 (the 12th Old World Conference in Phonology) in Barcelona in January 2015; at the Rutgers Optimality Research Group (RORG) of the Department of Linguistics at Rutgers University in September 2015; and at a course I have taught at the LSA Summer Linguistic Institute in July 2015.

=======================================================
Additional results
=======================================================

My work on EDL for phonotactics instantiates a more general computational approach to generative linguistics. The core assumption of this approach is that, in order for the learner to succeed at the language learning problem, the linguistic typology must have some non-trivial structure that the learner can exploit, such as that provided by the OT logic of transitive rankings, by the TH on FCCs for segment inventories, or by idempotence. This core axiom of a tight connection between learnability and typology of course extends beyond phonology, to other domains of linguistic theory such as semantics and syntax. During my MC, I have started to explore the implications of this approach for semantics, obtaining the results reported in a paper in the Proceedings of the 37th Annual Conference of the Cognitive Science Society