European Commission logo
polski polski
CORDIS - Wyniki badań wspieranych przez UE
CORDIS

Cross-lingual Event-centric Open Analytics Research Academy

Periodic Reporting for period 2 - Cleopatra (Cross-lingual Event-centric Open Analytics Research Academy)

Okres sprawozdawczy: 2021-01-01 do 2023-06-30

In the early decades of the 21st century, transnational and even ostensibly national events, for example the Paris terrorist attacks, Brexit and the COVID-19 pandemic, have strongly impacted the European community and the European digital economy across languages and borders. This has resulted in the generation of a vast amount of event-centric multilingual information available from different communities – in the news, on the Web and on social media. Cross-lingual technologies to efficiently access, analyse and interact with this information are of the utmost importance for various stakeholder groups across Europe, including data scientists, digital humanities researchers, memory institutions, publishers, media monitoring companies, journalists and policy-makers.
The Cleopatra ITN offers a unique interdisciplinary and intersectoral research and training programme addressing these challenges in order to educate a new generation of data scientists in Europe, with a specific focus on cross-lingual models, methods and technologies.

The main objectives of the ITN are to: 1) facilitate advanced cross-lingual processing of event-centric textual and visual information on a large scale; 2) develop innovative methods for efficient and intuitive user access to and interaction with multilingual information; 3) facilitate large-scale analytics of multilingual event-centric information and cross-cultural studies; 4) educate a group of top-level scientists with unique interdisciplinary and intersectoral expertise in multilingual information science who will be able to take on leading roles in research and industry in the future; and 5) establish an interdisciplinary curriculum for cross-lingual information analytics.

The main outcomes of Cleopatra include: 1) novel methods for event-centric cross-lingual data processing; 2) highly innovative user interaction paradigms for multilingual information; 3) large-scale, open data sets and software components for a variety of EU languages; and 4) an interdisciplinary curriculum and educational materials.
Overall, Cleopatra will contribute to the European digital economy in several application domains and strengthen the European position in multilingual information science.
All activities planned as part of the Cleopatra training and research programme in the reporting period (from 01/01/2019 to 30/06/2023) have been successfully conducted.

17 Early Stage Researchers (ESRs) were recruited in total. Two members of the original cohort left the project before the end of their contracts, for personal reasons, and consequently two new ESRs were recruited in the third year of Cleopatra.
Thanks to the project extension, there was an opportunity to extend the contracts of ESRs who joined the project at a later stage, thereby guaranteeing 36 months on the project for all ESRs. All of the recruited researchers have completed their career and development plans and produced innovative research proposals. 14 ESRs have completed their contracts and three have already submitted their PhD theses, three of whom (ESR 6, ESR 14, ESR 15) have successfully defended and published their doctoral dissertations.

All ESRs have benefited from a wide range of training, delivered both centrally – at the Kick-Off Workshop and during the Learning Weeks – and locally by the beneficiaries. Courses offered as part of the Cleopatra training programme and online events (R&D weeks and Hackathons) have been recorded and training materials are available online on the Cleopatra website at http://cleopatra-project.eu/index.php/video-lectures-and-training-materials/ and on http://videolectures.net/cleopatra2019/.

All ESRs have completed their secondment periods, which has exposed them to additional training opportunities and to different working practices and opportunities for collaboration.
The results of the research projects were presented during the R&D weeks and at several research conferences, both in person and online. All source code and documentation for the demonstrator projects has been published on Github (https://github.com/cleopatra-itn). The work on demonstrators and R&D projects is meant to continue after the end of Cleopatra and will hopefully result in long-term collaborations.
Cleopatra ESRs have published 50 joint and individual papers in relevant journals and conferences; five additional contributions have been accepted for publication and 11 contributions have been submitted and are under review. The beneficiaries submitted a number of additional scientific publications related to the project that are relevant as a reference for the work of the ESRs and for dissemination purposes.

Communication activities have been conducted according to the plan in the Grant Agreement, with key points of dissemination being the project website (http://cleopatra-project.eu/) and the Cleopatra Twitter account (https://twitter.com/Cleopatra_ITN). The project website and social media channels remain central to the promotion and dissemination of news and information about the project and wider programme.

In order to synthesise and showcase the overall findings of the Cleopatra project, an open-access monograph with contributions from ESRs, beneficiaries and partners was submitted to Springer and accepted in February 2023. The current working title for the book is ‘Event Analytics across Languages and Communities’, and it will be edited by Dr. Ivana Marenzi, Professor Jane Winters, Professor Marko Tadić, Dr. Simon Gottschalk and Dr. Eric Müller-Budack.
The research results from Cleopatra include open data sets, increased NLP support for the EU languages and a reusable analytics pipeline for multilingual data. Cleopatra also contributed to enhanced cooperation and better transfer of knowledge between public and private sectors, in particular through its interdisciplinary research and training programme and researcher exchange. This has supported enhanced excellence in the research capacity of project partners and increased the interdisciplinary and intersectoral mobility of researchers in Europe as a whole.

Within the beneficiary institutions, the Cleopatra network has highlighted the value and importance of conducting multilingual data analysis in order to gain rich insights into European society and culture. The datasets curated by the ESRs have been published as open data and will facilitate further multilingual research beyond the Cleopatra network. Their publications will disseminate methods to a wide range of research communities, from digital humanities researchers to computer and data scientists.

On these bases, the Cleopatra ITN has had a substantial impact on several dimensions of the research landscape. In addition to the positive impact on individual ESRs’ employability and careers, it has contributed to the European digital economy by providing new business opportunities in several domains, for example archiving, publishing, media monitoring, semantic services and journalism, through advances in cross-lingual technologies. In so doing, it has helped to open up the world of multilingual data analytics to stakeholders in various fields of expertise. Since it provides technology that facilitates cross-lingual and cross-cultural analytics on an unprecedented scale, Cleopatra can significantly impact European society as a whole.
Project Logo
Cleopatra project overview