Improving scientific excellence and creativity in combating disinformation with artificial intelligence and language technologies

Projektinformationen

DisAI

ID Finanzhilfevereinbarung: 101079164

DOI

10.3030/101079164

Projekt abgeschlossen

EK-Unterschriftsdatum 8 Juli 2022

Startdatum 1 Dezember 2022

Enddatum 30 November 2025

Finanziert unter

Widening participation and spreading excellence

Gesamtkosten

€ 1 499 750,00

EU-Beitrag

€ 1 499 750,00

1 499 750,00

Koordiniert durch

KEMPELENOV INSTITUT INTELIGENTNYCH TECHNOLOGII
Slovakia

Dieses Projekt findet Erwähnung in ...

CORDIS bietet Links zu öffentlichen Ergebnissen und Veröffentlichungen von HORIZONT-Projekten.

Links zu Ergebnissen und Veröffentlichungen von RP7-Projekten sowie Links zu einigen Typen spezifischer Ergebnisse wie Datensätzen und Software werden dynamisch von OpenAIRE abgerufen.

Leistungen

Data management plan – Version 1

The first version of the data management plan (DMP) will be prepared by M6 to outline how the research data are collected or generated and will be handled during a project, and after it is completed, describe what data will be collected/generated and following what methodology and standards, whether and how this data will be shared and/or made open, and how it will be curated and preserved. The DMP is a living document: it will be updated by the end of each reporting period. DMP includes: (1) What data will be collected/generated; (2) What standards will be used; (3) What types and format of data will be created; (4) How will metadata be generated; (5) How will the data be documented; (6) What data will be exploited, shared, made open; (7) How and by whom will data be curated and preserved.

Data management plan – Version 3

The final version of the data management plan (DMP) will be prepared by M30 to outline how the research data has been collected or generated by this point in the project and will be further handled after it is completed, whether and how this data will be shared and/or made open, and how it will be curated and preserved.

Data management plan – Version 2

The second updated version of the data management plan (DMP) will be prepared by M17 to outline how the research data is collected or generated by this point in the project and will be further handled during a project, and after it is completed, describe what data will be collected/generated and following what methodology and standards, whether and how this data will be shared and/or made open, and how it will be curated and preserved.

Plan for dissemination and exploitation including communication – Version 1

The document will outline the initial version of the detailed planning of the dissemination and communication activities in a systematic manner, with the aim of performing actions and campaigns that reach specific groups and audiences for specific purposes. The plan will also address the exploitation and sustainability aspects, ensuring the timely promotion of the project’s outcomes and engagement of parties outside the Consortium, interested to use or adopt them.

Shared task report

The report will consist of the description of the implementation and results of the shared task utilising a dataset created within the project. The task will be open to individual researchers (with focus on doctoral and postdoctoral researchers) as well as small teams also from external researchers. The shared task participants will be invited to submit their results and the best performing ones also a paper. Individual and general results of participants will be also included to the report.

Plan for dissemination and exploitation including communication – Version 2

The document will outline the final detailed planning of the dissemination and communication activities in a systematic manner, with the aim of performing actions and campaigns that reach specific groups and audiences for specific purposes. The plan will also address the exploitation and sustainability aspects based on the project's results at the time, ensuring the timely promotion of the project’s outcomes and engagement of parties outside the Consortium, interested to use or adopt them.

Summer school report

The report will describe the organizing and results of the Summer School on Trustworthy AI and NLP, which will take place in Slovakia. The summer school will be open to doctoral students, master's students in the last year of their study, and postdoctoral researchers within 3 years after receiving their Ph.D. degree. The study programme and summer school results will also be included to the report.

Report on CLARIN integration and other liaison activities

The report will outline the results of the implementation of the measures aimed at improving the visibility and integration of KInIT in major European LT initiatives and bodies, which include: active engagement in major LT bodies and projects, and linking with major LT industry associations.

Report on dissemination, communication and synergy activities – Version 1

The report will outline the mid-term results of dissemination and communication activities in a systematic manner listing the performed actions and campaigns that reach specific groups and audiences for specific purposes. The report will also address the exploitation and sustainability mid-term results, focusing on the outcomes of promotion of the project’s outcomes and engagement of parties outside the Consortium, interested to use or adopt them.

Ethics, innovation and IPR management plan

The document will describe the planned measures for IPR management, as well as ethics and innovation management. The followed policies like the Open Science policy will be mentioned.

Scientific webinar series report

The report will include information about the implementation of four webinars (online expert lectures followed by a discussion and knowledge sources sharing) and the results thereof (list of participants, presenters, topics discussed, feedback, and lessons learned). The webinars will cover the topics of disinformation combating, multimodal AI, multilingual AI and trustworthy AI.

Training report on research management support and operation

The report will include the description and results of the physical workshops or virtual webinars on the topic of research management support and operation, which will include the overall description of the topic, the organizational matters, participants list, outcomes, and lessons learned.

Training report on opportunity seeking and scientific proposal writing

The report will include the description and results of the physical workshops or virtual webinars on the topic of opportunity-seeking and proposal writing, which will include the overall description of the topic, the organizational matters, participants list, outcomes, and lessons learned.

Replication challenge report

The report will describe the organization and results of the replication challenge targeted at doctoral students. Each involved student will be assigned a mentor from the leading partners. Together they will select a scientific paper (related to the project topic) and replicate the research described therein. The process and results of individual projects' realization will also be included in the report.

Report on dissemination, communication and synergy activities – Version 2

The report will outline the final results of dissemination and communication activities in a systematic manner listing the performed actions and campaigns that reach specific groups and audiences for specific purposes. The report will also address the exploitation and sustainability mid-term results, focusing on the outcomes of promotion of the project’s outcomes and engagement of parties outside the Consortium, interested to use or adopt them.

Training report on innovation, technology transfer and networking

The report will include the description and results of the physical workshops or virtual webinars on the topic of innovation, technology transfer, and networking, which will include the overall description of the topic, the organizational matters, participants list, outcomes, and lessons learned.

Collected labelled dataset

The dataset with accompanying description including labelling scheme and selected quantitative characteristics. Data collection will include several approaches, e.g., automatic content crawling, crowdsourcing or using human experts to annotate data, based on requirements for various tasks. Data collected will be used to answer research questions posed in tasks 2.2-2.4 and will be treated following the FAIR principles.

AI methods for claim matching

The consortium will develop novel machine learning methods for multilingual claim matching, utilizing data from source languages to perform claim matching in target languages. The methods developed will be mainly based on foundation multilingual language models and will derive from these.

Visuals and branding materials

The plan will have a description of brand identity and visuals; website including a splash page; promotional material (brochure/leaflet, poster, roll-up, slides, poster, promo video); communication materials (press releases, newsletters, and additional mass mailing, “exploitation booster materials” to our target groups/stakeholders, aiming to contribute to its use and to maximize the impact of the project; social Media (SNS) management.

Veröffentlichungen

nvestigating Language and Retrieval Bias in Multilingual Previously Fact-Checked Claim Detection

Autoren: Ivan Vykopal, Antonia Karamolegkou, Jaroslav Kopčan, Qiwei Peng, Tomáš Javůrek, Michal Gregor, Marián Šimko
Veröffentlicht in: 2025
Herausgeber: Association for Computational Linguistics

o-MEGA: Optimized Methods for Explanation Generation and Analysis

Autoren: Ľuboš Kriš, Jaroslav Kopčan, Qiwei Peng, Andrej Ridzik, Marcel Veselý, Martin Tamajka
Veröffentlicht in: 2025
Herausgeber: Association for Computational Linguistics

HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization

Autoren: Peng, Qiwei; Chai, Yekun; Li, Xuhong
Veröffentlicht in: 2024
Herausgeber: International Committee on Computational Linguistics
DOI: 10.48550/arXiv.2402.16694

Tokenization Falling Short: On Subword Robustness in Large Language Models

Autoren: Chai, Yekun; Fang, Yewei; Peng, Qiwei; Li, Xuhong
Veröffentlicht in: Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
Herausgeber: Association for Computational Linguistics
DOI: 10.48550/ARXIV.2406.11687

Large Language Models for Multilingual Previously Fact-Checked Claim Detection

Autoren: Ivan Vykopal, Matúš Pikuliak, Simon Ostermann, Tatiana Anikina, Michal Gregor, Marian Simko
Veröffentlicht in: Findings of the Association for Computational Linguistics: EMNLP 2025, 2025
Herausgeber: Association for Computational Linguistics
DOI: 10.18653/V1/2025.FINDINGS-EMNLP.852

Women Are Beautiful, Men Are Leaders: Gender Stereotypes in Machine Translation and Language Modeling

Autoren: Matúš Pikuliak, Stefan Oresko, Andrea Hrckova, Marian Simko
Veröffentlicht in: Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
Herausgeber: Association for Computational Linguistics
DOI: 10.18653/V1/2024.FINDINGS-EMNLP.173

Concept Space Alignment in Multilingual LLMs

Autoren: Peng, Qiwei; Søgaard, Anders
Veröffentlicht in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
Herausgeber: Association for Computational Linguistics
DOI: 10.48550/ARXIV.2410.01079

Multimodal and Multilingual Fact-Checked Article Retrieval

Autoren: Stefanos-Iordanis Papadopoulos, Ivana Beňová, Sebastian Kula, Michal Gregor, George Karantaidis, Tomáš Javůrek, Marián Šimko, Symeon Papadopoulos
Veröffentlicht in: Proceedings of the 2025 International Conference on Multimedia Retrieval, 2025
Herausgeber: ACM
DOI: 10.1145/3731715.3733402

GrEmLIn: A Repository of Green Baseline Embeddings for 87 Low-Resource Languages Injected with Multilingual Graph Knowledge

Autoren: Daniil Gurgurov; Rishu Kumar; Simon Ostermann 0002
Veröffentlicht in: Findings of the Association for Computational Linguistics: NAACL 2025, 2025
Herausgeber: Association for Computational Linguistics
DOI: 10.48550/ARXIV.2409.18193

Comparing Specialised Small and General Large Language Models on Text Classification: 100 Labelled Samples to Achieve Break-Even Performance

Autoren: Pecher, Branislav; Srba, Ivan; Bielikova, Maria
Veröffentlicht in: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025
Herausgeber: Association for Computational Linguistics
DOI: 10.48550/ARXIV.2402.12819

Only for the Unseen Languages, Say the Llamas: On the Efficacy of Language Adapters for Cross-lingual Transfer in English-centric LLMs

Autoren: Julian Schlenker, Jenny Kunz, Tatiana Anikina, Günter Neumann, Simon Ostermann
Veröffentlicht in: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), 2025
Herausgeber: Association for Computational Linguistics
DOI: 10.18653/V1/2025.ACL-SRW.62

Multilingual Previously Fact-Checked Claim Retrieval

Autoren: Pikuliak, Matúš; Srba, Ivan; Moro, Robert; Hromadka, Timo; Smolen, Timotej; Melisek, Martin; Vykopal, Ivan; Simko, Jakub; Podrouzek, Juraj; Bielikova, Maria
Veröffentlicht in: 2023
Herausgeber: Association for Computational Linguistics
DOI: 10.48550/ARXIV.2305.07991

A Rigorous Evaluation of LLM Data Generation Strategies for Low-Resource Languages

Autoren: Tatiana Anikina, Jan Cegin, Jakub Simko, Simon Ostermann
Veröffentlicht in: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, 2025
Herausgeber: Association for Computational Linguistics
DOI: 10.18653/V1/2025.EMNLP-MAIN.418

SemEval-2025 Task 7: Multilingual and Crosslingual Fact-Checked Claim Retrieval

Autoren: Qiwei Peng, Robert Moro, Michal Gregor, Ivan Srba, Simon Ostermann, Marian Simko, Juraj Podrouzek, Matúš Mesarčík, Jaroslav Kopčan, Anders Søgaard
Veröffentlicht in: 2025
Herausgeber: Association for Computational Linguistics

Understanding Subword Compositionality of Large Language Models

Autoren: Qiwei Peng, Yekun Chai, Anders Søgaard
Veröffentlicht in: 2025
Herausgeber: Association for Computational Linguistics

Beyond Image-Text Matching: Verb Understanding in Multimodal Transformers Using Guided Masking

Autoren: Ivana Benová; Jana Kosecka; Michal Gregor; Martin Tamajka; Marcel Veselý; Marián Simko
Veröffentlicht in: Lecture Notes in Computer Science ISBN: 9783031826696, 2025
DOI: 10.1007/978-3-031-82670-2_7

Credible, Unreliable or Leaked?: Evidence verification for enhanced automated fact checking

Autoren: Chrysidis Z.; Papadopoulos S. I.; Papadopoulos S.; Petrantonakis P.
Herausgeber: The Association for Computing Machinery
DOI: 10.1145/3643491.3660278

LLMs vs Established Text Augmentation Techniques for Classification: When do the Benefits Outweight the Costs?

Autoren: Jan Cegin, Jakub Simko, Peter Brusilovsky
Veröffentlicht in: 2025
Herausgeber: Association for Computational Linguistics

Soft Language Prompts for Language Transfer

Autoren: Ivan Vykopal; Simon Ostermann 0002; Marián Simko
Veröffentlicht in: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2025
Herausgeber: Association for Computational Linguistics
DOI: 10.48550/ARXIV.2407.02317

Assessing Web Search Credibility and Response Groundedness in Chat Assistants

Autoren: Ivan Vykopal, Matúš Pikuliak, Simon Ostermann, Marián Šimko
Veröffentlicht in: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics, 2025
Herausgeber: Association for Computational Linguistics

skLEP: A Slovak General Language Understanding Benchmark

Autoren: Marek Suppa, Andrej Ridzik, Daniel Hládek, Tomáš Javůrek, Viktória Ondrejová, Kristína Sásiková, Martin Tamajka, Marian Simko
Veröffentlicht in: 2025
Herausgeber: Association for Computational Linguistics

Pessimistic Off-Policy Optimization for Learning to Rank

Autoren: Matej Cief, Branislav Kveton, Michal Kompan
Veröffentlicht in: Frontiers in Artificial Intelligence and Applications, ECAI 2024, 2024
Herausgeber: IOS Press
DOI: 10.3233/FAIA240703

In-Depth Look at Word Filling Societal Bias Measures

Autoren: Pikuliak, Matúš; Beňová, Ivana; Bachratý, Viktor
Veröffentlicht in: 2023
Herausgeber: Association for Computational Linguistics
DOI: 10.48550/ARXIV.2302.12640

Small Models, Big Impact: Efficient Corpus and Graph-Based Adaptation of Small Multilingual Language Models for Low-Resource Languages

Autoren: Daniil Gurgurov, Ivan Vykopal, Josef Van Genabith, Simon Ostermann
Veröffentlicht in: 2025
Herausgeber: Association for Computational Linguistics

On Multilingual Encoder Language Model Compression for Low-Resource Languages

Autoren: Daniil Gurgurov, Michal Gregor, Josef Van Genabith, Simon Ostermann
Veröffentlicht in: 2025
Herausgeber: Association for Computational Linguistics

Debiasing Multilingual LLMs in Cross-lingual Latent Space

Autoren: Qiwei Peng, Guimin Hu, Yekun Chai, Anders Søgaard
Veröffentlicht in: 2025
Herausgeber: Association for Computational Linguistics
DOI: 10.48550/ARXIV.2508.17948

Adapting Multilingual LLMs to Low-Resource Languages with Knowledge Graphs via Adapters

Autoren: Daniil Gurgurov; Mareike Hartmann; Simon Ostermann 0002
Veröffentlicht in: Proceedings of the 1st Workshop on Knowledge Graphs and Large Language Models (KaLLM 2024), 2024
Herausgeber: Association for Computational Linguistics
DOI: 10.48550/ARXIV.2407.01406

Disinformation Capabilities of Large Language Models

Autoren: Ivan Vykopal, Matúš Pikuliak, Ivan Srba, Robert Moro, Dominik Macko, Maria Bielikova
Veröffentlicht in: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
Herausgeber: Association for Computational Linguistics
DOI: 10.18653/V1/2024.ACL-LONG.793

CV-Probes: Studying the interplay of lexical and world knowledge in visually grounded verb understanding

Autoren: Benova, Ivana; Gregor, Michal; Gatt, Albert
Veröffentlicht in: CoRR, 2024
Herausgeber: CogSci 2025
DOI: 10.48550/ARXIV.2409.01389

On Sensitivity of Learning with Limited Labelled Data to the Effects of Randomness: Impact of Interactions and Systematic Choices

Autoren: Branislav Pecher; Ivan Srba; Mária Bieliková
Veröffentlicht in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
Herausgeber: Association for Computational Linguistics
DOI: 10.48550/ARXIV.2402.12817

Similarity Over Factuality: Are we Making Progress on Multimodal Out-of-Context Misinformation Detection?

Autoren: Stefanos-Iordanis Papadopoulos, Christos Koutlis, Symeon Papadopoulos, Panagiotis C. Petrantonakis
Veröffentlicht in: 2025
Herausgeber: IEEE/CVF

Cross-Validated Off-Policy Evaluation

Autoren: Matej Cief; Branislav Kveton; Michal Kompan
Veröffentlicht in: Proceedings of the AAAI Conference on Artificial Intelligence, 2025
Herausgeber: AAAI Press
DOI: 10.48550/ARXIV.2405.15332

Average Is Not Enough: Caveats of Multilingual Evaluation

Autoren: Pikuliak, Matúš; Šimko, Marián
Veröffentlicht in: 2023
Herausgeber: Association for Computational Linguistics
DOI: 10.48550/ARXIV.2301.01269

On Training Data Influence of GPT Models

Autoren: Qingyi Liu; Yekun Chai; Shuohuan Wang; Yu Sun 0004; Qiwei Peng 0002; Keze Wang; Hua Wu 0003
Veröffentlicht in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
Herausgeber: EMNLP 2024
DOI: 10.48550/ARXIV.2404.07840

dfkinit2b at CheckThat! 2025: Leveraging LLMs and Ensemble of Methods for Multilingual Claim Normalization

Autoren: Tatiana Anikina, Ivan Vykopal, Sebastian Kula, Ravi Kiran Chikkala, Natalia Skachkova, Jing Yang, Veronika Solopova, Vera Schmitt, Simon Ostermann
Veröffentlicht in: 2025
Herausgeber: CEUR

Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation

Autoren: Jan Cegin, Branislav Pecher, Jakub Simko, Ivan Srba, Maria Bielikova, Peter Brusilovsky
Veröffentlicht in: Findings of the Association for Computational Linguistics: EMNLP 2025, 2025
Herausgeber: Association for Computational Linguistics
DOI: 10.18653/V1/2025.FINDINGS-EMNLP.296

Automatic Fact-checking in English and Telugu

Autoren: Ravikiran Chikkala, Tatiana Anikina, Natalia Skachkova, Ivan Vykopal, Rodrigo Agerri, Josef van Genabith
Veröffentlicht in: 2025
Herausgeber: INCOMA Ltd., Shoumen, Bulgaria

'Humor, Art, or Misinformation?': A Multimodal Dataset for Intent-Aware Synthetic Image Detection

Autoren: Anastasios Skoularikis, Stefanos-Iordanis Papadopoulos, Symeon Papadopoulos, Panagiotis C. Petrantonakis
Veröffentlicht in: Proceedings of the 2nd International Workshop on Diffusion of Harmful Content on Online Web, 2025
Herausgeber: ACM
DOI: 10.1145/3746275.3762215

Task Prompt Vectors: Effective Initialization Through Multi-task Soft Prompt Transfer

Autoren: Róbert Belanec; Simon Ostermann 0002; Ivan Srba; Mária Bieliková
Veröffentlicht in: Lecture Notes in Computer Science ISBN 9783662722428, 2025
Herausgeber: Springer-Verlag
DOI: 10.48550/ARXIV.2408.01119

Multilingual Political Views of Large Language Models: Identification and Steering

Autoren: Daniil Gurgurov, Katharina Trinley, Ivan Vykopal, Josef Van Genabith, Simon Ostermann, Roberto Zamparelli
Veröffentlicht in: 2025
Herausgeber: Association for Computational Linguistics

Language Arithmetics: Towards Systematic Language Neuron Identification and Manipulation

Autoren: Daniil Gurgurov, Katharina Trinley, Yusser Al Ghussin, Tanja Baeumel, Josef Van Genabith, Simon Ostermann
Veröffentlicht in: 2025
Herausgeber: Association for Computational Linguistics

ACM Computing Surveys

Autoren: Branislav Pecher; Ivan Srba; Maria Bielikova
Veröffentlicht in: ACM Computing Surveys, 2024, ISSN 0360-0300
Herausgeber: Association for Computing Machinary, Inc.
DOI: 10.48550/ARXIV.2312.01082

Suche nach OpenAIRE-Daten ...

Leistungen

Veröffentlichungen

Herunterladen Den Inhalt der Seite herunterladen