Related papers: A Taxonomy of Response Strategies to Toxic Online Content: Evaluating the Evidence

A Taxonomy of Response Strategies to Toxic Online Content: Evaluating the Evidence

URL: http://arxiv.org/abs/2509.09921v2
Date: Mon, 13 Oct 2025 19:22:36 GMT
Title: A Taxonomy of Response Strategies to Toxic Online Content: Evaluating the Evidence
Authors: Lisa Schirch, Kristina Radivojevic, Cathy Buerger,
Abstract summary: Toxic Online Content (TOC) includes messages on digital platforms that are harmful, hostile, or damaging to constructive public discourse.<n>There is a wide variation in their goals, terminology, response strategies, and methods of evaluating impact.<n>This paper identifies a taxonomy of online response strategies to include any type of online speech to build healthier online public discourse.
Score: 0.6372261626436676
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Toxic Online Content (TOC) includes messages on digital platforms that are harmful, hostile, or damaging to constructive public discourse. Individuals, organizations, and LLMs respond to TOC through counterspeech or counternarrative initiatives. There is a wide variation in their goals, terminology, response strategies, and methods of evaluating impact. This paper identifies a taxonomy of online response strategies, which we call Online Discourse Engagement (ODE), to include any type of online speech to build healthier online public discourse. The literature on ODE makes contradictory assumptions about ODE goals and rarely distinguishes between them or rigorously evaluates their effectiveness. This paper categorizes 25 distinct ODE strategies, from humor and distraction to empathy, solidarity, and fact-based rebuttals, and groups these into a taxonomy of five response categories: defusing and distracting, engaging the speaker's perspective, identifying shared values, upstanding for victims, and information and fact-building. The paper then systematically reviews the evidence base for each of these categories. By clarifying definitions, cataloging response strategies, and providing a meta-analysis of research papers on these strategies, this article aims to bring coherence to the study of ODE and to strengthen evidence-informed approaches for fostering constructive ODE.

Related papers

Latent Topic Synthesis: Leveraging LLMs for Electoral Ad Analysis [51.95395936342771]
We introduce an end-to-end framework for automatically generating an interpretable topic taxonomy from an unlabeled corpus.<n>We apply this framework to a large corpus of Meta political ads from the month ahead of the 2024 U.S. Presidential election.<n>Our approach uncovers latent discourse structures, synthesizes semantically rich topic labels, and annotates topics with moral framing dimensions.
arXiv Detail & Related papers (2025-10-16T20:30:20Z)
WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research [73.58638285105971]
This paper tackles textbfopen-ended deep research (OEDR), a complex challenge where AI agents must synthesize vast web-scale information into insightful reports.<n>We introduce textbfWebWeaver, a novel dual-agent framework that emulates the human research process.<n>Our framework establishes a new state-of-the-art across major OEDR benchmarks, including DeepResearch Bench, DeepConsult, and DeepResearchGym.
arXiv Detail & Related papers (2025-09-16T17:57:21Z)
HatePRISM: Policies, Platforms, and Research Integration. Advancing NLP for Hate Speech Proactive Mitigation [67.69631485036665]
We conduct a comprehensive examination of hate speech regulations and strategies from three perspectives.<n>Our findings reveal significant inconsistencies in hate speech definitions and moderation practices across jurisdictions.<n>We suggest ideas and research direction for further exploration of a unified framework for automated hate speech moderation.
arXiv Detail & Related papers (2025-07-06T11:25:23Z)
Understanding and Analyzing Inappropriately Targeting Language in Online Discourse: A Comparative Annotation Study [1.0923877073891446]
This paper introduces a method for detecting inappropriately targeting language in online conversations by integrating crowd and expert annotations with ChatGPT.<n>We focus on English conversation threads from Reddit, examining comments that target individuals or groups.<n>We perform a comparative analysis of annotations from human experts, crowd annotators, and ChatGPT, revealing strengths and limitations of each method in recognizing both explicit hate speech and subtler discriminatory language.
arXiv Detail & Related papers (2025-05-22T16:10:43Z)
Talking Point based Ideological Discourse Analysis in News Events [62.18747509565779]
We propose a framework motivated by the theory of ideological discourse analysis to analyze news articles related to real-world events.<n>Our framework represents the news articles using a relational structure - talking points, which captures the interaction between entities, their roles, and media frames along with a topic of discussion.<n>We evaluate our framework's ability to generate these perspectives through automated tasks - ideology and partisan classification tasks, supplemented by human validation.
arXiv Detail & Related papers (2025-04-10T02:52:34Z)
Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges [52.96987928118327]
We find that embedding models for retrieval, rerankers, and large language model (LLM) relevance judges are vulnerable to content injection attacks.<n>We identify two primary threats: (1) inserting unrelated or harmful content within passages that still appear deceptively "relevant", and (2) inserting entire queries or key query terms into passages to boost their perceived relevance.<n>Our study systematically examines the factors that influence an attack's success, such as the placement of injected content and the balance between relevant and non-relevant material.
arXiv Detail & Related papers (2025-01-30T18:02:15Z)
Assessing the Human Likeness of AI-Generated Counterspeech [10.434435022492723]
This paper investigates the human likeness of AI-generated counterspeech.<n>We implement and evaluate several LLM-based generation strategies.<n>We reveal differences in linguistic characteristics, politeness, and specificity.
arXiv Detail & Related papers (2024-10-14T18:48:47Z)
Evaluating Copyright Takedown Methods for Language Models [100.38129820325497]
Language models (LMs) derive their capabilities from extensive training on diverse data, including potentially copyrighted material. This paper introduces the first evaluation of the feasibility and side effects of copyright takedowns for LMs. We examine several strategies, including adding system prompts, decoding-time filtering interventions, and unlearning approaches.
arXiv Detail & Related papers (2024-06-26T18:09:46Z)
The Unappreciated Role of Intent in Algorithmic Moderation of Social Media Content [2.2618341648062477]
This paper examines the role of intent in content moderation systems. We review state of the art detection models and benchmark training datasets for online abuse to assess their awareness and ability to capture intent.
arXiv Detail & Related papers (2024-05-17T18:05:13Z)
Discursive objection strategies in online comments: Developing a classification schema and validating its training [2.6603898952678167]
Most Americans agree that misinformation, hate speech and harassment are harmful and inadequately curbed on social media. We conducted a content analysis of more than 6500 comment replies to trending news videos on YouTube and Twitter. We identified seven distinct discursive objection strategies.
arXiv Detail & Related papers (2024-05-13T19:39:00Z)
Countering Malicious Content Moderation Evasion in Online Social Networks: Simulation and Detection of Word Camouflage [64.78260098263489]
Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems. This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content.
arXiv Detail & Related papers (2022-12-27T16:08:49Z)
Handling Bias in Toxic Speech Detection: A Survey [26.176340438312376]
We look at proposed methods for evaluating and mitigating bias in toxic speech detection. Case study introduces the concept of bias shift due to knowledge-based bias mitigation. Survey concludes with an overview of the critical challenges, research gaps, and future directions.
arXiv Detail & Related papers (2022-01-26T10:38:36Z)
RESPER: Computationally Modelling Resisting Strategies in Persuasive Conversations [0.7505101297221454]
We propose a generalised framework for identifying resisting strategies in persuasive conversations. Our experiments reveal the asymmetry of power roles in non-collaborative goal-directed conversations. We also investigate the role of different resisting strategies on the conversation outcome.
arXiv Detail & Related papers (2021-01-26T03:44:17Z)
Detecting and Classifying Malevolent Dialogue Responses: Taxonomy, Data and Methodology [68.8836704199096]
Corpus-based conversational interfaces are able to generate more diverse and natural responses than template-based or retrieval-based agents. With their increased generative capacity of corpusbased conversational agents comes the need to classify and filter out malevolent responses. Previous studies on the topic of recognizing and classifying inappropriate content are mostly focused on a certain category of malevolence.
arXiv Detail & Related papers (2020-08-21T22:43:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.