Natural language processing (NLP) is a technology that we encounter frequently in the digital world: for example, it is involved when we use an automatic translation service, or when typing a question into a search engine and getting back an answer extracted from the web. While this often works remarkably well for languages like English, the performance of such systems is significantly worse when it involves less-researched languages like Basque, Finnish, or Polish. This is an important societal issue, as it contributes to a "digital language divide" where speakers who are not proficient in English are put at a disadvantage. The MorphIRe project worked on closing this gap by developing techniques that perform better on a broader range of the world's languages.
Today's NLP models mostly work with artificial intelligence and machine learning: techniques that require large amounts of training data---e.g. sets of questions with their correct answers---which are then fed into an algorithm that "learns" to perform the task. Importantly, the techniques that are widely used today are indifferent to which language is being used---whether the task is performed on English or on Basque, the algorithms work exactly the same. In particular, they do not take into account the word-internal (i.e. "morphological") structure of these languages: whereas English tends to use separate words to express different grammatical and semantic concepts, morphologically richer languages like Basque can express these concepts within a single word form (compare English "because of the rain" with Basque "euriagatik").
The MorphIRe project provides direct evidence that we shouldn't ignore the morphological structure of languages when building NLP models, as it contributes to errors that current state-of-the-art NLP models make. It also proposes a new algorithm for word segmentation that better corresponds to morphological structure. By highlighting these problems in today's NLP models and working towards concrete solutions that can be integrated into these models, the MorphIRe project makes an important contribution towards improving NLP technology for a wider range of languages.