European Commission logo
español español
CORDIS - Resultados de investigaciones de la UE
CORDIS

CORPUS OF MALTESE & MALTESE ENGLISH SPEECH

Final Report Summary - COMMES (Corpus of Maltese & Maltese English Speech)

Introduction

Compared to other languages, Maltese continues to be under-studied and under-resourced (see recent META-NET White Paper Series which reports on the level of digital support through language technology for a number of different languages, http://www.meta-net.eu/whitepapers/press-release). This is also the case for its prosody and intonation, aspects of the language which are studied in this project. To complement the study of Maltese the project also investigates the prosody and intonation of Maltese English, the variety of English spoken in Malta.

Objectives

The COMMES project aimed to provide a phonological analysis of the prosody and intonation of Maltese and Maltese English (MaltE), and to develop a tones and break indices (ToBI)-style annotation system primarily for Maltese, but extending it to Maltese English for the purposes of allowing direct comparison with both Maltese and other varieties of English for which there is a ToBI system. Prosodically annotated corpora are not only useful for theoretical research, they also provide a resource for those attempting to develop more natural-sounding spoken output for speech and language technology applications. The project aimed therefore to produce sample prosodically annotated materials for Maltese and Maltese English, and to provide detailed guidelines for the annotation of further data.

Results and conclusions

This project has contributed to a more in-depth understanding of aspects of the prosody and intonation of Maltese and MaltE. Annotated data were used as the basis for formulating and testing a number of different hypotheses. The project COMMES has given the researcher the skills to implement techniques for automatically extracting elements from the annotations, and timestamps for these, using scripts written specifically for the task at hand. It has also enabled her to carry out statistical analysis using up-to-date statistical analysis techniques.

In-depth corpus-based analyses were carried out on the prosody and intonation of Maltese and MaltE. One of these involved an examination of question forms and associated functions in both Maltese and MaltE.

Another of these in-depth analyses looked at various mechanisms used by speakers engaged in dialogue to provide feedback to each other, specific attention being given to a phenomenon known as 'backchannelling'. The data from Maltese were analysed with a view to establishing the correlation between different intonation patterns and discourse functions of elements in the data, both lexical ones such as orrajt 'alright', owkey 'okay', sewwa, tajjeb, both meaning 'good', as well as non-lexical elements such as ee, ehe and mhm. The occurrence and distribution of these elements within the dialogues, was compared to data from languages such as German, Italian and Vietnamese. Such comparison has important implications for our understanding of communication in intercultural contexts. As a by-product of this analysis, an investigation of overlapping speech and of the degree to which speakers overlap with their interlocutor depending on which language they are speaking was also begun. Analysis of this aspect, together with an analysis of accompanying gestures, is expected to be pursued further.

A third investigation carried out involved a detailed examination of some of the annotated data with a view to establishing what different kinds of silent intervals, breaks vs pauses, occur in Maltese. The results of this analysis feed into the development of one aspect of the ToBI-style standards mentioned above, that of a b(reak) i(ndex) tier.

A fourth in-depth analysis, this time on wh-questions, was also carried out using experimental techniques in laboratory phonology. This study involved a field trip to Malta for data collection. Material was carefully designed in order to investigate a number of specific research questions. The results from this study have an impact on intonation theory in general, in that evidence was provided for a so far unattested word-edge initial tone in pragmatically induced complementary distribution with a pitch accent. That is, the same word can have an edge tone in one sentence type and a pitch accent in another.

The various analyses carried out have allowed for the fine tuning of elements of the phonological analysis of intonation in Maltese and MaltE, as well as for further development of the ToBI-style standards and conventions for the annotation of data from both. These are documented on a website for Maltese and MaltE prosody and intonation. The website includes examples of prosodically annotated data from the COMMES project as well as guidelines on the use of annotation standards and conventions (http://www.phonetik.phil-fak.uni-koeln.de/221.htm).

Impacts

The standards and conventions for the annotation of the prosody and intonation of spoken Maltese and MaltE developed in this project will be useful for training young researchers in prosodic analysis. A resource including prosodic annotations will not only further research on the prosody and intonation of Maltese and MaltE in itself, but will be instrumental to those developing intelligent applications involving the use and/or processing of natural language. This is especially the case for multilingual human language technologies in areas such as text to speech, dialogue systems and spoken language translation involving both Maltese and MaltE.