Skip to main content
European Commission logo
English English
CORDIS - EU research results
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary

High Performance Language Technologies

Project description

Innovative technology added to the panoply of the European Language Grid

The EU-funded HPLT project applies high-performance computing to scale and advance language technologies. Taking advantage of recent advances in machine learning and astonishing storage capacities, it will create and process huge language datasets and produce language and translation models in a large number of languages. The resulting models will be tested from various angles to ensure smooth integration, high accuracy, and regulatory compliance concerning privacy, unwanted biases and ethical issues. The models and data sets will be a game changer in the language service market in the EU and beyond. The resulting models will be open, free and available from established language repositories for anyone interested in pursuing research or innovation projects.

Objective

High Performance Language Technologies (HPLT) is a space combining petabytes of natural language data with large-scale model training. With trillions of words of text, the space will be the largest open text collection. Cleaning and privacy protecting services improve the quality and ethical properties of the text. Going beyond static repositories that require the user to individually analyze each data set, the project will rate data sets by how much they improve end-to-end language models and machine translation systems. Continuous integration of models and data will result in free downloadable high-quality models for all official European Union languages and beyond. The models will be reproducible with information and evaluation metrics shown in a publicly available dashboard. By focusing on training at scale, the project complements the inference-focused European Language Grid, which in turn will be used for model deployment. Datasets, models and information about them will be published in recognized FAIR data repositories, aggregation catalogues and marketplaces for easy discovery, access, replication, and exploitation.

Coordinator

UNIVERZITA KARLOVA
Net EU contribution
€ 641 812,50
Address
OVOCNY TRH 560/5
116 36 Praha 1
Czechia

See on map

Region
Česko Praha Hlavní město Praha
Activity type
Higher or Secondary Education Establishments
Links
Total cost
€ 641 812,50

Participants (6)

Partners (1)