Periodic Reporting for period 6 - SOCSEMICS (Socio-Semantic Bubbles of Internet Communities)
Berichtszeitraum: 2023-09-01 bis 2024-08-31
- Socio-Semantic Network Modeling: developing formal models to analyze social networks where nodes have semantic attributes, assessing the extent of socio-semantic clustering across different online communities.
- Computational Linguistics and Opinion Attribution: creating tools to extract and categorize opinions from textual content, focusing on the sentence level and enabling the classification of user beliefs in a more nuanced manner than approaches based on word distributions.
- Qualitative-Quantitative Integration: conducting interviews augmented with network analysis and visualizations to compare computational findings with real-world user perceptions. A major goal of this pillar was also to develop innovative visualization methods to represent socio-semantic networks within a single space, offering new ways to analyze and interpret socio-semantic coevolution.
1. Advances in Socio-Semantic Network Analysis
Focusing on online communities, the project contributed significantly to socio-semantic network analysis, a field that examines the interaction between social structures and semantic meaning, both:
- empirically: Socsemics analyzed how social groups align with semantic categories, particularly in relation to ideological divides (e.g. left vs. right-wing), stance on controversial issues (e.g. climate skeptics vs. non-skeptics), and geographical location (e.g. intra-EU vs. intercontinental). Contrary to common assumptions, the research found that not all online communities function as echo chambers. While affiliation-based networks (e.g. Twitter retweets) tend to resemble echo chambers, interaction-based networks (e.g. Twitter quotes) often display greater ideological diversity. This suggests that the nature of online engagement plays a crucial role in shaping discourse.
- methodologically: the project extended stochastic block modeling (SBM) to analyze social and semantic clustering beyond a single metadata category (e.g. political affiliation or topic preferences).
A habilitation manuscript and two book chapters further contributed to establish the sociological and computational relevance of combining social and semantic network analysis.
2. Advances in Computational Linguistics and Opinion Representation
To analyze user opinions in digital spaces, the project introduced Semantic Hypergraphs (SHs)—a novel framework for representing sentence-level meaning through directed, recursive hyperedges. Originally developed to categorize user stances on social media, SHs proved more relevant for semi-supervised information retrieval, particularly in contexts requiring rigorous, transparent interpretable methods—offering an alternative to large language models (LLMs), which often lack transparency. It allows efficient pattern-based extraction of structured information, enabling human operators to refine queries with minimal effort; and it configures an alternative to traditional semantic graphs, with potential commercial applications, particularly for companies requiring cost-effective, customizable information retrieval solutions. Recognition of its potential has led to an early-stage €150k application grant (starting in 2025) and an ERC Proof-of-Concept resubmission after earning a Seal of Excellence.
3. Advances in Socio-Semantic Network Visualization
The project developed novel ways to visualize social network structures alongside semantic properties by representing social and semantic elements in a unified hybrid visualization, mapping social clusters while considering semantic properties, revealing whether cohesive groups share similar semantics or not; and developing an interactive platform to assess socio-semantic fragmentation, showing how connections (or their absence) reflect structural and semantic patterns. Additionally, a new user sampling method based on structural modeling, rather than demographic quotas, was implemented. Combined with an augmented interview protocol (where users engage with visualizations of their network position), this approach represents a novel method in socio-semantic research.
Besides, three key software developments can be noted:
- graphbrain (github.com/graphbrain) an open-source Python library for semantic hypergraph analysis, that was used primarily within the project to extract structured claims from online discussions.
- metablox (github.com/lenafm/metablox) a stochastic block modeling tool for analyzing how categorical metadata (e.g. political affiliations) shape social networks.
- chronoblox (github.com/lobbeque/chronoblox) a visualization tool integrating structural and semantic properties over time, introducing "network chronophotography" for tracking dynamic changes.
- Semantic Hypergraphs (SHs) challenge the ubiquitous use of semantic graphs in knowledge representation and extraction (in academia and in the industry) by proposing a framework that both permets arbitrary complexity (thanks to the recursiveness of SHs) and remains amenable to human interpretation and processing, making it a viable alternative to opaque AI models like LLMs.
- Metadata-Informed Stochastic Block Modeling, that, first, accommodates metadata and semantic features, and, second, takes into account the non-exclusive contribution of various categories in the observed structure. This enables more nuanced analysis of how multiple semantic factors shape social structures, by fostering and empirically operationalizing the notion that many semantic dimensions may concurrently contribute to a network structure; a key development for network sociology.
- Joint Socio-Semantic network Visualization, proposing a unified approach for tracking structural and semantic changes in networks over time, in a "network chronophotography" that opens new avenues in dynamic network visualization, based on the novel idea that nodes, across various periods of time, should be placed in the same two-dimensional space based on the similarity of their semantic features.
- Refining the Echo Chamber Debate, by demonstrating that echo chambers exist in some online interactions but not others, resolving prior contradictions in research; especially when showing the co-existence of phenomena akin and not akin to echo chambers around the same content and on the same platform.