Machine learning models of babies learning phonology
Babies start learning phonotactics at a very early age; within a few months after birth, they have already become sensitive to some of the properties that distinguish languages from each other. Nine-month-olds can distinguish sequences of sounds that obey the phonotactics of their language from those that do not. They can also distinguish sequences that are frequent from others that are infrequent but phonotactically legal. According to the error-driven learning model, the learner maintains a current hypothesis of the adult phonotactics and keeps slightly updating its current hypothesis whenever it makes a mistake on the incoming stream of data. Although this learning model has been endorsed by the optimality theory because of its cognitive plausibility, there is still lack of a computational model that accurately describes error-driven learning. The EU-funded project MODACQUPHON (Modeling the acquisition of phonotactics) produced significant results supporting the research hypothesis that error-driven learning in constraint-based phonology provides. Work touched on different issues of computational phonology. The first concerned consistency and convergence. Scientists investigated whether phonotactics learned by the error-driven learning model is consistent with the training data, and if the model eventually stops making mistakes and settles on a final grammar. Another set of results addressed a more challenging issue than consistency, namely correctness. The focus was on whether the model makes a distinction between licit and illicit sounds and sound combinations and eventually learns to rule out illicit forms as in the target adult language– in other words, whether the final grammar captures the adult phonotactics. Relevant phonological properties are extracted by a set of universal constraints that measure how phonological structures deviate from the ideal – markedness and faithfulness constraints. Scientists established the first set of sufficient conditions for correctness for a class of target phonotactic patterns characterized by the property that the relative ranking of the faithfulness constraints does contribute to the distinction between licit and illicit forms. The team also supported that the error-driven learning model will not sufficiently describe the child's acquisition of phonotactics unless there are also restrictions on faithfulness constraints. Infants acquire phonotactics at a stage where they still do not have access to phonological alterations. Scientists claimed that the strength of the error-driven model is that it is trained on surface forms only and thus does not require information on underlying forms, thus contributing an explanation to the puzzle of how phonotactics can be acquired prior to morphological awareness. MODACQUPHON was devoted to proving that error-driven learning is a proper model of the child's acquisition of phonotactics, both from a computational and a modelling perspective. Project results were disseminated in numerous academic publications.
Keywords
Machine learning, babies, language acquisition, phonotactics, error-driven learning, MODACQUPHON