Responsive classifiers against hate speech in low-resource settings

Informazioni relative al progetto

Respond2Hate

ID dell’accordo di sovvenzione: 101100870

DOI

10.3030/101100870

Progetto chiuso

Data della firma CE 16 Giugno 2023

Data di avvio 1 Novembre 2023

Data di completamento 30 Aprile 2025

Finanziato da

European Research Council (ERC)

Costo totale

Nessun dato

Contributo UE

€ 150 000,00

Coordinato da

LUDWIG-MAXIMILIANS-UNIVERSITAET MUENCHEN
Germany

Periodic Reporting for period 1 - Respond2Hate (Responsive classifiers against hate speech in low-resource settings)

Periodo di rendicontazione: 2023-11-01 al 2025-04-30

Online hate speech poses a global challenge, significantly impacting marginalized communities by creating unsafe digital environments and perpetuating discrimination and violence. Despite extensive efforts by major tech companies to curb online hate speech using advanced natural language processing (NLP) technologies, their solutions predominantly cater to high-resource languages due to extensive dataset requirements. Consequently, communities speaking low-resource languages, often concentrated in regions of the Global South, remain vulnerable due to the lack of effective tools to counter online hate speech.

The Respond2Hate project addresses this critical gap by creating innovative, client-side solutions empowering individual users, especially in low-resource linguistic contexts, to proactively detect and filter out hate speech. The project's objective is to leverage state-of-the-art NLP techniques, particularly low-resource transfer learning and few-shot learning methods, to build adaptive and personalized models that users can easily deploy and adjust according to their specific cultural and social contexts. This approach democratizes hate speech detection, ensuring broader and more inclusive online safety.

Respond2Hate successfully completed all planned project milestones and deliverables:

- Market Report: An in-depth market and competitor analysis was conducted, identifying key differentiators and strategic entry points. Unlike existing English-focused, server-based hate speech solutions, Respond2Hate uniquely emphasizes client-side processing, ensuring privacy and customization. Additionally, a user survey conducted in the targeted markets revealed significant demand for such a tool, highlighting that individuals are frequently exposed to hate speech daily. Limited consultations with selected NGOs indicated that user-driven customization, regulatory compliance, and scalability could be important advantages. Initial target markets identified include Ukrainian and Afrikaans-speaking regions, with plans for expansion leveraging linguistic and cultural proximity.

- High-Quality Hate Speech Dataset (REACT): We developed a comprehensive, culturally-informed dataset curated by expert native or highly proficient speakers. The dataset categorizes data by sentiment polarity (positive, neutral, hateful) and distinguishes between profanity and hate speech. It was primarily inspired by social media posts and other content from regional online platforms, covering marginalized groups, including LGBTQ+ communities and war victims, across various low-resource languages.

- Responsive Hate Speech Classifier Models: We created a simulated federated learning (FL) framework enabling privacy-preserving, on-device hate speech detection. The project innovatively combined general federated learning with personalization strategies (FedPer) and client-specific adapter layers. Models were rigorously tested with zero-shot and few-shot learning scenarios to assess adaptability. Additionally, a multilingual language model specifically fine-tuned on REACT datasets provided the foundation for effective online real-time hate speech classification.

- Client-side Hate Speech Filtering Prototype: A robust prototype was developed comprising a browser extension (Manifest V3 compliant) and a Python-based local hate speech checker using a fine-tuned multilingual BERT model. The extension intelligently parses and classifies text, highlighting or blocking hateful content according to user preferences. Specific parsers for platforms like YouTube, Reddit, and Twitter/X enhance accuracy. User feedback mechanisms allow ongoing improvement of both individual and global detection models via federated learning.

- Pre-market Testing: Extensive pre-market testing involving student assistants with diverse language and cultural backgrounds was successfully conducted across major platforms, including Reddit and YouTube. This testing demonstrated the real-world efficacy, usability, and robustness of the prototype under realistic usage conditions.

Respond2Hate significantly advances the state-of-the-art by introducing client-side, personalized hate speech detection that effectively operates in low-resource contexts. Unlike existing server-side approaches, this innovation ensures privacy and autonomy by enabling local adaptation without sharing sensitive data externally. The project's federated learning framework represents a notable step forward, providing scalable model training while preserving user privacy.

The release of REACT datasets significantly enriches multilingual NLP resources, providing invaluable assets for ongoing academic research and practical applications globally. The browser extension and integrated hate speech checker offer practical solutions for end-users, empowering marginalized communities to independently moderate their online environments.

Further uptake and success of Respond2Hate’s approach will depend on continued technical refinement, especially user-friendly improvements in model updates and platform adaptability, sustained engagement with local anti-hate organizations, and exploration of broader commercialization pathways, potentially supported by a supportive regulatory framework and collaborative industry standards. Future research should also explore extending the federated learning methodologies to additional marginalized languages and broader categories of online toxicity to amplify the positive societal impact.

Periodic Reporting for period 1 - Respond2Hate (Responsive classifiers against hate speech in low-resource settings)

Condividi questa pagina Condividi questa pagina sui social network

Scarica Scarica il contenuto della pagina