Skip to main content

European Lexicographic Infrastructure

Periodic Reporting for period 2 - ELEXIS (European Lexicographic Infrastructure)

Reporting period: 2019-08-01 to 2021-01-31

Reliable and accurate information on word meaning and usage is of crucial importance in today’s information society. The most consolidated and refined knowledge on word meanings can traditionally be found in dictionaries – monolingual, bilingual or multilingual.

Dictionaries are not only vast, systematic inventories of information on words, they are also important as cultural and historical artefacts. In each and every European country, elaborate efforts are put into the development of lexicographic resources describing the language(s) of the community. Although confronted with similar problems relating to technologies for producing and making these resources available, cooperation on a larger European scale has long been limited.

Consequently, the lexicographic landscape in Europe is currently rather heterogeneous. On the one hand, it is characterised by stand-alone lexicographic resources, which are typically encoded in incompatible data structures due to the isolation of efforts, prohibiting reuse of this valuable data in other fields, such as natural language processing, linked open data and the Semantic Web, as well as in the context of digital humanities. On the other hand, there is a significant variation in the level of expertise and resources available to lexicographers across Europe. This forms a major obstacle to more ambitious, innovative, transnational, data-driven approaches to dictionaries, both as tools and objects of research.

ELEXIS aims to develop an infrastructure which will:

Objective 1 foster cooperation and knowledge exchange between different research communities in lexicography in order to bridge the gap between lesser-resourced languages and those with advanced e-lexicographic experience;
Objective 2 establish common standards and solutions for the development of lexicographic resources;
Objective 3 develop strategies, tools and standards for extracting, structuring and linking of lexicographic resources;
Objective 4 enable access to standards, methods, lexicographic data and tools for scientific communities, industries and other stakeholders;
Objective 5 promote an open access culture in lexicography, in line with the European Commission Recommendation on access to and preservation of scientific information.
One of the aims of the ELEXIS project is to provide cost-free access for academic institutions in the EU to various infrastructures provided by the project partners. The number of available infrastructures will grow in the course of the project and there are no financial implications to the institutions for accessing them.

Already available
The Sketch Engine corpus query, corpus building and corpus management system allows users to build and work with 300+ text corpora in over 90 languages and 20 scripts. Sketch Engine contains a number unique tools to analyse large corpora of up to 30 billion words.

Lexonomy is a cloud-based dictionary writing and also online dictionary publishing system which is highly scalable to adapt to large dictionary projects as well as small lexicographic works such as editing and online publishing of domain-specific glossaries or terminology resources.
ELEXIS impact

I. Providing efficient access to quality lexicographic data

Computational linguistics and language resources communities will gain access to currently inaccessible data from quality lexicographic resources and interlinked semantic data, as well as extracted data from corpora and multimodal resources;
Digital humanities communities will gain user-friendly and efficient access to modern and historical lexicographic resources as cultural and historical artefacts, supporting research in a wide area of humanities disciplines such as history, religion, gender studies, literature and education.

II. Establishing inter-infrastructure synergies and optimisation

Currently isolated European language infrastructures working on lexical description of individual languages in national language institutes and standardisation bodies will be joined in one pan-European infrastructure.
Close links and synergies will be established between CLARIN and DARIAH, with ELEXIS working on top of existing services as a new user community.

III. Enabling the use of new technology and data in industry

Industrial partners in ELEXIS will be able to take the role of intermediaries between research and industry in language technology and language learning, as well as lexicography and lexical content publishing in general. Interest of industry is visible from participating partners and from the letters of interest by important stakeholders in the field;
Information from quality lexicographic resources and interlinked semantic data will be opened up and made available for use in commercial scenarios, based on ELEXIS work on IPR issues currently hindering the accessibility of the data.
Lexicographic data will be evaluated by industry-supported data seal of compliance.

IV. Facilitating inclusion of innovative lexicography in research and education

Online training courses on innovative e-lexicography with suggested ECTS produced by education partners (from universities) will be incorporated into existing curricula.
Language teaching and language learning communities will be able to develop and use new improved training materials, based on the (open) access to lexica interlinked on a large scale.
Previously unaccessible lexicographic data will be made available for research through virtual access platforms and through visiting grants in trans-national access.

V. Encouraging cross-disciplinary fertilisations in academia and industry

Both computational linguistics and lexicography will be able to achieve a higher level of language description and text processing in a virtuous cycle of cross-disciplinary exchange of knowledge and data;
Research or study of lexica in linguistic studies and related disciplines will be enabled by massive interlinking of previously isolated lexicographic resources, which can lead to new discoveries, particularly in the semantic domain.
In humanities disciplines, such as history, religion, gender studies, literature and education, new resources and services can be used for cross-lingual studies, based on interlinked and integrated semantic data;
artificial intelligence systems will be able to make use of lexicographic data in repositories, interlinked semantic data and extracted data from multilingual and multimodal resources.

VI. Enabling massive integration of knowledge-based resources

Stand-alone modern and historical lexicographic resources available as isolated incompatible data will be linked, integrated and enriched on different levels. A scalable, multilingual and multifunctional, language resource will be created by:

- linking resources
- integrating resources
- enriching resources with multimodal data (image, sound, video), and unstructured text (corpora, news feeds, social media etc.)

Ultimate goal is the creation of a universal (integrated and enriched) registry/network of semantic relations used as a semantic intermediary language for global knowledge exchange, focused on difficult polysemous vocabulary (single-word and multi-word), modern and historical; the realisation of a universal lexicographic metastructure; a matrix dictionary spanning across languages and time.