ELEXIS impact
I. Providing efficient access to quality lexicographic data
Computational linguistics and language resources communities will gain access to currently inaccessible data from quality lexicographic resources and interlinked semantic data, as well as extracted data from corpora and multimodal resources;
Digital humanities communities will gain user-friendly and efficient access to modern and historical lexicographic resources as cultural and historical artefacts, supporting research in a wide area of humanities disciplines such as history, religion, gender studies, literature and education.
II. Establishing inter-infrastructure synergies and optimisation
Currently isolated European language infrastructures working on lexical description of individual languages in national language institutes and standardisation bodies will be joined in one pan-European infrastructure.
Close links and synergies will be established between CLARIN and DARIAH, with ELEXIS working on top of existing services as a new user community.
III. Enabling the use of new technology and data in industry
Industrial partners in ELEXIS will be able to take the role of intermediaries between research and industry in language technology and language learning, as well as lexicography and lexical content publishing in general. Interest of industry is visible from participating partners and from the letters of interest by important stakeholders in the field;
Information from quality lexicographic resources and interlinked semantic data will be opened up and made available for use in commercial scenarios, based on ELEXIS work on IPR issues currently hindering the accessibility of the data.
Lexicographic data will be evaluated by industry-supported data seal of compliance.
IV. Facilitating inclusion of innovative lexicography in research and education
Online training courses on innovative e-lexicography with suggested ECTS produced by education partners (from universities) will be incorporated into existing curricula.
Language teaching and language learning communities will be able to develop and use new improved training materials, based on the (open) access to lexica interlinked on a large scale.
Previously unaccessible lexicographic data will be made available for research through virtual access platforms and through visiting grants in trans-national access.
V. Encouraging cross-disciplinary fertilisations in academia and industry
Both computational linguistics and lexicography will be able to achieve a higher level of language description and text processing in a virtuous cycle of cross-disciplinary exchange of knowledge and data;
Research or study of lexica in linguistic studies and related disciplines will be enabled by massive interlinking of previously isolated lexicographic resources, which can lead to new discoveries, particularly in the semantic domain.
In humanities disciplines, such as history, religion, gender studies, literature and education, new resources and services can be used for cross-lingual studies, based on interlinked and integrated semantic data;
artificial intelligence systems will be able to make use of lexicographic data in repositories, interlinked semantic data and extracted data from multilingual and multimodal resources.
VI. Enabling massive integration of knowledge-based resources
Stand-alone modern and historical lexicographic resources available as isolated incompatible data will be linked, integrated and enriched on different levels. A scalable, multilingual and multifunctional, language resource will be created by:
- linking resources
- integrating resources
- enriching resources with multimodal data (image, sound, video), and unstructured text (corpora, news feeds, social media etc.)
Ultimate goal is the creation of a universal (integrated and enriched) registry/network of semantic relations used as a semantic intermediary language for global knowledge exchange, focused on difficult polysemous vocabulary (single-word and multi-word), modern and historical; the realisation of a universal lexicographic metastructure; a matrix dictionary spanning across languages and time.