Skip to main content

Human interaction and the evolution of spoken accent

Periodic Reporting for period 2 - InterAccent (Human interaction and the evolution of spoken accent)

Reporting period: 2019-04-01 to 2020-09-30

The project's main concern is with developing a cognitive-computational model of sound change to predict the very first stages of spoken accent development that comes about when individuals in close contact and/or isolation interact with each other over a period of time. The high risk aspect of the project derives from a combination of (a) tracking minute pronunciation changes to individuals from anywhere between several months to 4 years and then (b) testing whether these changes can be predicted using an agent-based computational model that is initialised with some of the spoken attributes of the speakers before they meet/interact with each other.

The project's contribution to society is at many levels. (1) the project is important historically because it provides information about the origins of accents (e.g. why American English sounds different from British English). (2) the further social benefit lies in understanding how migration influences language change (3) there are benefits to security given that identifying the characteristics of a person's spoken accent by machine is important in forensic investigations of speech and language in e.g. assessing the likelihood that a a person's voice corresponds to a recording made of the person (4) there are social benefits to speech pathology since the acquisition of child speech data over a long period of time provides unique normative data against which pathological speech can be assessed.

The project's primary objective is to build a cognitive-computational model to explain how a spoken accent evolves. There are three associated secondary objectives. One of these is to explain how the interaction between individuals leads to innovations in spoken accent. Another is to explain how the accent of a community influences the spoken accent of an individual who belongs to, or who comes into contact with the community. Finally, another secondary objective is to understand how the characteristics of a spoken accent evolve from the ubiquitous phonetic variation in producing speech.
Six inter-related work strands have been undertaken.
1. Recordings from two schools in Bavaria. Acoustic recordings were made of 21 Bavarian speaking children (13 female, 8 male) from their first year of attendance at primary school (average age 6.5 years). The child recordings were obtained in two primary schools in a rural area around 60 km from Munich that differed in the extent to which children with a Bavarian accent are exposed to migrant children with a non-German L1 language background (heterogeneous vs. homogeneous composition of accents). The school with heterogeneous classes was located in Burgkirchen a.d. Alz, the homogeneous school 13 km apart in Wald a.d. Alz. In the homogeneous school, recordings of 9 Bavarian speaking children were obtained and 12 Bavarian speaking children were recorded in the heterogeneous school. Three of the children of the heterogeneous school moved away in the course of the first and second school year so that 10 Bavarian speaking children remained for the second year of recordings and 9 for the third year of recordings. For the homogeneous school, the number of recorded children stayed the same throughout all recording timepoints. Ultrasound recordings were also obtained from 9 children per school type and per year (for the first and the second school year so far).

2. Recordings from Albania. We have recorded data in Albania in both 2018 and 2019. During 2018 we recorded 46 children and 22 adults. Out of 46 children, 20 children were recorded in the city of Tirana and 26 children in the villages Petrelë, Krrabë and Bërzhitë near the city of Tirana. All children participated in these acoustic recordings. Additionally, 6 children in Tirana and 8 children in the village of Bërzhitë participated both in an acoustic and ultrasound task in which recordings were made of tongue movement during vowel production. During 2019, we recorded data from 37 children only. Eighteen children were recorded in Tirana and 19 children in the villages Krrabë and Bërzhitë. Again, all children participated in the acoustic task. Nine Tirana children and 9 children in the village of Bërzhitë participated both in the acoustic and the ultrasound task.

3. Agent-based model. Extensive research has been carried out building the agent based computational model which has featured in publications [4, 6]. The further innovation in the last 18 months has been to add an algorithm that uses a technique called functional principal components analysis for mapping from multi-dimensional time-varying trajectories to a low-dimensional vector space that captures the most relevant shape variations. The background to developing this technique in explained in publication [8]. The software with extensive documentation has been made available to the public at

4. Recordings were made using real-time magnetic resonance imaging (MRI) from 27 British and 18 American English speakers with the aim of determining in an agent-based model whether the British English speakers' more oral vowel productions are attracted towards the greater degree of nasalization that typifies vowel production before nasal consonants in American English speakers of certain dialects. All participants spoke an extensive corpus of about 150 target words in two prosodic conditions, designed in order to quantify how nasal vowels become when preceding a nasal consonant (e.g. the degree of nasalisation of /a/ in 'ban'). The MRI recordings had excellent temporal and spatial resolution (50 frames/s, pixel size 1.4 x 1.4mm). The MRI equipment was extended with an optical microphone and synchronous noise-suppression of the audio signal. The associated background publications for quantifying nasalisation in vowels from MRI data are in [1, 2, 7].

5. Ultrasound and video recordings. Ultrasound recordings were made of tongue-movement during speech (sagittal view of tongue) combined with synchronized audio recordings, and video recordings in profile and frontal view. Processing to date has focussed on extracting head movement from marker tracking in the video data. This is quite time-consuming, but has the major advantages that firstly the ultrasound data can then be mapped to an anatomically defined (skull-based) coordinate system, and that secondly lip-movement information can be readily extracted from the video, making analysis of lip-tongue coordination possible. A further advantage is that the experimental setup is comfortable for the participants (particularly important for work with children) since no head-restraining devices are required to attach the probe to the speaker. This has recently been further enhanced by adapting a 3D-printable probe-holder specifically for use with children (IPS member Dr. C. Carignan was involved in the original design published in Derrick et al., 2018, Journal of the Acoustical Society of America, 144(5), EL392–EL398, DOI: 10.1121/1.5066350). To streamline analysis of the ultrasound data itself, recent work has focussed on making our data compatible with the open-source Matlab project GetContours developed by M. Tiede and D. Whalen at Haskins Labs. which gives flexible access to state-of-the-art tracking algorithms such as deformable active contour models ('snakes') and the multi-hypothesis tracking procedure ('SLURP') of Laporte & Menard (2018)

6. Recordings from Antarctic 'winterers'. This was one of the major high risk parts of the project which was concerned with establishing whether scientists and staff who spent several months isolated together in an Antarctic winter begin to develop their own characteristic spoken accent - and whether such changes could be predicted by the agent-based model sketched in 3. above. This led to a publication [6] and attracted some media attention which is listed in dissemination and outputs of the present report.
There are at least six ways in which the research has been innovative and progress so far is beyond the state of the art.

1. We have acquired and processed nasalization data using magnetic resonance imaging. With the software developed in collaboration with our partners at Göttingen university, we have developed the recording technique to obtain unusually sharp MRI images at high sampling frequencies. No-one has before obtained and processed this type of data on coarticulatory nasalization before from such a large number of speakers and word types.

2. We are the first group to my knowledge to have obtained longitudinal speech recordings from primary school children (three re-recordings over 18 months) that combine ultrasound, acoustic speech data and video recordings.

3. We have collected acoustic and ultrasound data of the tongue from both children and adults in Albania (which is a vastly under-studied language). This is now the largest body of speech recorded data of its kind for Albania in existence.

4. We have pioneered the development of an agent-based computational model of sound change in which agents exchange time-varying speech data based on actual speech recordings (as opposed to making use of artificial data).

5. We have also pioneered the application of the technique of functional principal components analysis for modelling how sound change develops from time-varying speech data.

6. We are the first to have recorded speech data from a group of speakers in isolation in Antarctica and predict the evolving accent using the agent-based computational model.

We expect results in at least the following areas:
- a further development of the agent-based model that incorporates knowledge from functional principal components analysis in order to predict the direction of different types of sound changes including those in which there are mergers and splits.

- Advances in understanding how the physiological changes due to vocal tract maturation in children are distinguished from changes to spoken accent due to inter-personal interaction.

- A new understanding of how time-varying speech data can lead to the sound change in which vowels develop contrastive nasalization using the real-time MRI data.

- An application of the agent-based computational model of sound change to predict the spoken accent develop of children over 4 years in the primary schools in Bavaria and/or Albania.

- We will have obtained a more detailed understanding than ever before of how vowels of Albania vary between dialects and between adults and children.