Toxicity in Online Platforms and AI Systems: A Survey of Needs, Challenges, Mitigations, and Future Directions
- URL: http://arxiv.org/abs/2509.25539v1
- Date: Mon, 29 Sep 2025 21:55:23 GMT
- Title: Toxicity in Online Platforms and AI Systems: A Survey of Needs, Challenges, Mitigations, and Future Directions
- Authors: Smita Khapre, Melkamu Abay Mersha, Hassan Shakil, Jonali Baruah, Jugal Kalita,
- Abstract summary: The evolution of digital communication systems and the designs of online platforms have inadvertently facilitated the subconscious propagation of toxic behavior.<n>This survey attempts to generate a comprehensive taxonomy of toxicity from various perspectives.<n>It presents a holistic approach to explain the toxicity by understanding the context and environment that society is facing in the Artificial Intelligence era.
- Score: 12.73085307172367
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The evolution of digital communication systems and the designs of online platforms have inadvertently facilitated the subconscious propagation of toxic behavior. Giving rise to reactive responses to toxic behavior. Toxicity in online content and Artificial Intelligence Systems has become a serious challenge to individual and collective well-being around the world. It is more detrimental to society than we realize. Toxicity, expressed in language, image, and video, can be interpreted in various ways depending on the context of usage. Therefore, a comprehensive taxonomy is crucial to detect and mitigate toxicity in online content, Artificial Intelligence systems, and/or Large Language Models in a proactive manner. A comprehensive understanding of toxicity is likely to facilitate the design of practical solutions for toxicity detection and mitigation. The classification in published literature has focused on only a limited number of aspects of this very complex issue, with a pattern of reactive strategies in response to toxicity. This survey attempts to generate a comprehensive taxonomy of toxicity from various perspectives. It presents a holistic approach to explain the toxicity by understanding the context and environment that society is facing in the Artificial Intelligence era. This survey summarizes the toxicity-related datasets and research on toxicity detection and mitigation for Large Language Models, social media platforms, and other online platforms, detailing their attributes in textual mode, focused on the English language. Finally, we suggest the research gaps in toxicity mitigation based on datasets, mitigation strategies, Large Language Models, adaptability, explainability, and evaluation.
Related papers
- Unveiling Covert Toxicity in Multimodal Data via Toxicity Association Graphs: A Graph-Based Metric and Interpretable Detection Framework [58.01529356381494]
We propose a novel detection framework based on Toxicity Association Graphs (TAGs)<n>We introduce the first quantifiable metric for hidden toxicity, the Multimodal Toxicity Covertness (MTC)<n>Our approach enables precise identification of covert toxicity while preserving full interpretability of the decision-making process.
arXiv Detail & Related papers (2026-02-03T08:54:25Z) - Defining, Understanding, and Detecting Online Toxicity: Challenges and Machine Learning Approaches [4.1824815480811806]
This study represents the synthesis of 140 publications on different types of toxic content on digital platforms.<n>The dataset encompasses content in 32 languages, covering topics such as elections, spontaneous events, and crises.<n>We present the recommendations and guidelines for new research on online toxic consent and the use of content moderation for mitigation.
arXiv Detail & Related papers (2025-09-14T00:16:53Z) - Something Just Like TRuST : Toxicity Recognition of Span and Target [2.4169078025984825]
This paper introduces TRuST, a comprehensive dataset designed to improve toxicity detection.<n>We benchmark state-of-the-art large language models (LLMs) on toxicity detection, target group identification, and toxic span extraction.<n>We find that fine-tuned models consistently outperform zero-shot and few-shot prompting, though performance remains low for certain social groups.
arXiv Detail & Related papers (2025-06-02T23:48:16Z) - GloSS over Toxicity: Understanding and Mitigating Toxicity in LLMs via Global Toxic Subspace [62.68664365246247]
This paper investigates the underlying mechanisms of toxicity generation in Large Language Models (LLMs)<n>We propose GloSS (Global Toxic Subspace Suppression), a lightweight, four-stage method that mitigates toxicity by identifying and removing the global toxic subspace from the parameters of FFN.
arXiv Detail & Related papers (2025-05-20T08:29:11Z) - ShieldVLM: Safeguarding the Multimodal Implicit Toxicity via Deliberative Reasoning with LVLMs [72.8646625127485]
Multimodal implicit toxicity appears not only as formal statements in social platforms but also prompts that can lead to toxic dialogs.<n>Despite the success in unimodal text or image moderation, toxicity detection for multimodal content, particularly the multimodal implicit toxicity, remains underexplored.<n>To advance the detection of multimodal implicit toxicity, we build ShieldVLM, a model which identifies implicit toxicity in multimodal statements, prompts and dialogs via deliberative cross-modal reasoning.
arXiv Detail & Related papers (2025-05-20T07:31:17Z) - Redefining Toxicity: An Objective and Context-Aware Approach for Stress-Level-Based Detection [1.9424018922013224]
Most toxicity detection models treat toxicity as an intrinsic property of text, overlooking the role of context in shaping its impact.<n>We reconceptualise toxicity as a socially emergent stress signal.<n>We introduce a new framework for toxicity detection, including a formal definition and metric, and validate our approach on a novel dataset.
arXiv Detail & Related papers (2025-03-20T12:09:01Z) - Aligned Probing: Relating Toxic Behavior and Model Internals [78.20380492883022]
We introduce aligned probing, a novel interpretability framework that aligns the behavior of language models (LMs)<n>Using this framework, we examine over 20 OLMo, Llama, and Mistral models, bridging behavioral and internal perspectives for toxicity for the first time.<n>Our results show that LMs strongly encode information about the toxicity level of inputs and subsequent outputs, particularly in lower layers.
arXiv Detail & Related papers (2025-03-17T17:23:50Z) - Comprehensive Assessment of Toxicity in ChatGPT [49.71090497696024]
We evaluate the toxicity in ChatGPT by utilizing instruction-tuning datasets.
prompts in creative writing tasks can be 2x more likely to elicit toxic responses.
Certain deliberately toxic prompts, designed in earlier studies, no longer yield harmful responses.
arXiv Detail & Related papers (2023-11-03T14:37:53Z) - ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in
Real-World User-AI Conversation [43.356758428820626]
We introduce ToxicChat, a novel benchmark based on real user queries from an open-source chatbots.
Our systematic evaluation of models trained on existing toxicity datasets has shown their shortcomings when applied to this unique domain of ToxicChat.
In the future, ToxicChat can be a valuable resource to drive further advancements toward building a safe and healthy environment for user-AI interactions.
arXiv Detail & Related papers (2023-10-26T13:35:41Z) - Handling Bias in Toxic Speech Detection: A Survey [26.176340438312376]
We look at proposed methods for evaluating and mitigating bias in toxic speech detection.
Case study introduces the concept of bias shift due to knowledge-based bias mitigation.
Survey concludes with an overview of the critical challenges, research gaps, and future directions.
arXiv Detail & Related papers (2022-01-26T10:38:36Z) - RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language
Models [93.151822563361]
Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language which hinders their safe deployment.
We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration.
arXiv Detail & Related papers (2020-09-24T03:17:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.