During the first reporting period, AVALANCHE established its analytical, technical and ethical foundations. A complete State-of-the-Art and gap analysis was conducted, followed by structured requirements elicitation through questionnaires and interviews with the end-user (SPP). These inputs were consolidated into validated scenarios, KPIs and system requirements.
Based on these results, the consortium produced the AVALANCHE Reference Architecture, describing system components, data flows, actors, interaction diagrams and integration points aligned with LEA workflows.
Technical progress advanced in four key areas:
Web data collection: A first version of the MEDUSA crawler was released, capable of collecting textual and multimedia content from surface and dark web sources. A configurable forum parser and an initial disinformation detection prototype were also developed.
Behavioural analysis: The foundation for identifying the “origin of spread” of online content was defined. A dedicated OCR tool for extracting text from images was implemented, with planned extension to video. Deepfake detection models were benchmarked.
Sentiment and hate-speech detection: Benchmarking of BERT-based, local LLM and commercial LLM approaches led to a decision to develop an in-house NLP solution aligned with LEA requirements.
Secure information exchange: Initial work on a federated data schema and a secure, encrypted exchange mechanism was completed, including concepts for hashing, signing, integrity verification and one-time retrieval links.
Legal, ethical and data governance work progressed through the establishment of the Ethical Advisory Board and the preparation of a FAIR-aligned Data Management Plan, ensuring compliance with data protection and ethical requirements.