Periodic Reporting for period 1 - Respond2Hate (Responsive classifiers against hate speech in low-resource settings)
Periodo di rendicontazione: 2023-11-01 al 2025-04-30
The Respond2Hate project addresses this critical gap by creating innovative, client-side solutions empowering individual users, especially in low-resource linguistic contexts, to proactively detect and filter out hate speech. The project's objective is to leverage state-of-the-art NLP techniques, particularly low-resource transfer learning and few-shot learning methods, to build adaptive and personalized models that users can easily deploy and adjust according to their specific cultural and social contexts. This approach democratizes hate speech detection, ensuring broader and more inclusive online safety.
- Market Report: An in-depth market and competitor analysis was conducted, identifying key differentiators and strategic entry points. Unlike existing English-focused, server-based hate speech solutions, Respond2Hate uniquely emphasizes client-side processing, ensuring privacy and customization. Additionally, a user survey conducted in the targeted markets revealed significant demand for such a tool, highlighting that individuals are frequently exposed to hate speech daily. Limited consultations with selected NGOs indicated that user-driven customization, regulatory compliance, and scalability could be important advantages. Initial target markets identified include Ukrainian and Afrikaans-speaking regions, with plans for expansion leveraging linguistic and cultural proximity.
- High-Quality Hate Speech Dataset (REACT): We developed a comprehensive, culturally-informed dataset curated by expert native or highly proficient speakers. The dataset categorizes data by sentiment polarity (positive, neutral, hateful) and distinguishes between profanity and hate speech. It was primarily inspired by social media posts and other content from regional online platforms, covering marginalized groups, including LGBTQ+ communities and war victims, across various low-resource languages.
- Responsive Hate Speech Classifier Models: We created a simulated federated learning (FL) framework enabling privacy-preserving, on-device hate speech detection. The project innovatively combined general federated learning with personalization strategies (FedPer) and client-specific adapter layers. Models were rigorously tested with zero-shot and few-shot learning scenarios to assess adaptability. Additionally, a multilingual language model specifically fine-tuned on REACT datasets provided the foundation for effective online real-time hate speech classification.
- Client-side Hate Speech Filtering Prototype: A robust prototype was developed comprising a browser extension (Manifest V3 compliant) and a Python-based local hate speech checker using a fine-tuned multilingual BERT model. The extension intelligently parses and classifies text, highlighting or blocking hateful content according to user preferences. Specific parsers for platforms like YouTube, Reddit, and Twitter/X enhance accuracy. User feedback mechanisms allow ongoing improvement of both individual and global detection models via federated learning.
- Pre-market Testing: Extensive pre-market testing involving student assistants with diverse language and cultural backgrounds was successfully conducted across major platforms, including Reddit and YouTube. This testing demonstrated the real-world efficacy, usability, and robustness of the prototype under realistic usage conditions.
The release of REACT datasets significantly enriches multilingual NLP resources, providing invaluable assets for ongoing academic research and practical applications globally. The browser extension and integrated hate speech checker offer practical solutions for end-users, empowering marginalized communities to independently moderate their online environments.
Further uptake and success of Respond2Hate’s approach will depend on continued technical refinement, especially user-friendly improvements in model updates and platform adaptability, sustained engagement with local anti-hate organizations, and exploration of broader commercialization pathways, potentially supported by a supportive regulatory framework and collaborative industry standards. Future research should also explore extending the federated learning methodologies to additional marginalized languages and broader categories of online toxicity to amplify the positive societal impact.