Evaluating Online Moderation Via LLM-Powered Counterfactual Simulations
- URL: http://arxiv.org/abs/2511.07204v1
- Date: Mon, 10 Nov 2025 15:31:59 GMT
- Title: Evaluating Online Moderation Via LLM-Powered Counterfactual Simulations
- Authors: Giacomo Fidone, Lucia Passaro, Riccardo Guidotti,
- Abstract summary: Large Language Models (LLMs) can be successfully leveraged to enhance Agent-Based Modeling.<n>We design a simulator of online social networks enabling a counterfactual simulation where toxic behavior is influenced by moderation interventions.<n>We conduct extensive experiments, unveiling the psychological realism of OSN agents and the superior effectiveness of personalized moderation strategies.
- Score: 2.429376470369691
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Online Social Networks (OSNs) widely adopt content moderation to mitigate the spread of abusive and toxic discourse. Nonetheless, the real effectiveness of moderation interventions remains unclear due to the high cost of data collection and limited experimental control. The latest developments in Natural Language Processing pave the way for a new evaluation approach. Large Language Models (LLMs) can be successfully leveraged to enhance Agent-Based Modeling and simulate human-like social behavior with unprecedented degree of believability. Yet, existing tools do not support simulation-based evaluation of moderation strategies. We fill this gap by designing a LLM-powered simulator of OSN conversations enabling a parallel, counterfactual simulation where toxic behavior is influenced by moderation interventions, keeping all else equal. We conduct extensive experiments, unveiling the psychological realism of OSN agents, the emergence of social contagion phenomena and the superior effectiveness of personalized moderation strategies.
Related papers
- Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction [2.5450067638785945]
This study introduces Conditioned Comment Prediction (CCP), a task in which a model predicts how a user would comment on a given stimulus.<n>We evaluate open-weight 8B models (Llama3.1, Qwen3, Ministral) in English, German, and Luxembourgish language scenarios.
arXiv Detail & Related papers (2026-02-26T08:40:21Z) - ARTIS: Agentic Risk-Aware Test-Time Scaling via Iterative Simulation [72.78362530982109]
ARTIS, Agentic Risk-Aware Test-Time Scaling via Iterative Simulation, is a framework that decouples exploration from commitment.<n>We show that naive LLM-based simulators struggle to capture rare but high-impact failure modes.<n>We introduce a risk-aware tool simulator that emphasizes fidelity on failure-inducing actions.
arXiv Detail & Related papers (2026-02-02T06:33:22Z) - Social Simulations with Large Language Model Risk Utopian Illusion [61.358959720048354]
We introduce a systematic framework for analyzing large language models' behavior in social simulation.<n>Our approach simulates multi-agent interactions through chatroom-style conversations and analyzes them across five linguistic dimensions.<n>Our findings reveal that LLMs do not faithfully reproduce genuine human behavior but instead reflect overly idealized versions of it.
arXiv Detail & Related papers (2025-10-24T06:08:41Z) - Integrating LLM and Diffusion-Based Agents for Social Simulation [28.21329943306884]
We propose a hybrid simulation framework that strategically integrates large language model (LLM)-driven agents with diffusion model-based agents.<n>Our framework outperforms existing methods in prediction accuracy, validating the effectiveness of its modular design.
arXiv Detail & Related papers (2025-10-18T06:23:22Z) - Implicit Behavioral Alignment of Language Agents in High-Stakes Crowd Simulations [3.0112218223206173]
Language-driven generative agents have enabled social simulations with transformative uses, from interpersonal training to aiding global policy-making.<n>Recent studies indicate that generative agent behaviors often deviate from expert expectations and real-world data--a phenomenon we term the Behavior-Realism Gap.<n>We introduce a theoretical framework called Persona-Environment Behavioral Alignment (PEBA), formulated as a distribution matching problem grounded in Lewin's behavior equation.<n>We propose PersonaEvolve (PEvo), an LLM-based optimization algorithm that iteratively refines agent personas, implicitly aligning their collective behaviors with realistic expert benchmarks within a specified environmental context.
arXiv Detail & Related papers (2025-09-19T22:35:13Z) - EcoLANG: Efficient and Effective Agent Communication Language Induction for Social Simulation [49.789575209305724]
Large language models (LLMs) have demonstrated an impressive ability to role-play humans and replicate complex social dynamics.<n>Existing solutions, such as distributed mechanisms or hybrid agent-based model (ABM) integrations, either fail to address inference costs or compromise accuracy and generalizability.<n>We propose EcoLANG: Efficient and Effective Agent Communication Language Induction for Social Simulation.
arXiv Detail & Related papers (2025-05-11T08:51:56Z) - SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users [70.02370111025617]
We introduce SocioVerse, an agent-driven world model for social simulation.<n>Our framework features four powerful alignment components and a user pool of 10 million real individuals.<n>Results demonstrate that SocioVerse can reflect large-scale population dynamics while ensuring diversity, credibility, and representativeness.
arXiv Detail & Related papers (2025-04-14T12:12:52Z) - Large Language Model Driven Agents for Simulating Echo Chamber Formation [5.6488384323017]
The rise of echo chambers on social media platforms has heightened concerns about polarization and the reinforcement of existing beliefs.<n>Traditional approaches for simulating echo chamber formation have often relied on predefined rules and numerical simulations.<n>We present a novel framework that leverages large language models (LLMs) as generative agents to simulate echo chamber dynamics.
arXiv Detail & Related papers (2025-02-25T12:05:11Z) - GenSim: A General Social Simulation Platform with Large Language Model based Agents [111.00666003559324]
We propose a novel large language model (LLMs)-based simulation platform called textitGenSim.<n>Our platform supports one hundred thousand agents to better simulate large-scale populations in real-world contexts.<n>To our knowledge, GenSim represents an initial step toward a general, large-scale, and correctable social simulation platform.
arXiv Detail & Related papers (2024-10-06T05:02:23Z) - Shall We Team Up: Exploring Spontaneous Cooperation of Competing LLM Agents [18.961470450132637]
This paper emphasizes the importance of spontaneous phenomena, wherein agents deeply engage in contexts and make adaptive decisions without explicit directions.
We explored spontaneous cooperation across three competitive scenarios and successfully simulated the gradual emergence of cooperation.
arXiv Detail & Related papers (2024-02-19T18:00:53Z) - Systematic Biases in LLM Simulations of Debates [12.933509143906141]
We study the limitations of Large Language Models in simulating human interactions.<n>Our findings indicate a tendency for LLM agents to conform to the model's inherent social biases.<n>These results underscore the need for further research to develop methods that help agents overcome these biases.
arXiv Detail & Related papers (2024-02-06T14:51:55Z) - Training Socially Aligned Language Models on Simulated Social
Interactions [99.39979111807388]
Social alignment in AI systems aims to ensure that these models behave according to established societal values.
Current language models (LMs) are trained to rigidly replicate their training corpus in isolation.
This work presents a novel training paradigm that permits LMs to learn from simulated social interactions.
arXiv Detail & Related papers (2023-05-26T14:17:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.