Toxicity Ahead: Forecasting Conversational Derailment on GitHub
- URL: http://arxiv.org/abs/2512.15031v1
- Date: Wed, 17 Dec 2025 02:45:12 GMT
- Title: Toxicity Ahead: Forecasting Conversational Derailment on GitHub
- Authors: Mia Mohammad Imran, Robert Zita, Rahat Rizvi Rahman, Preetha Chatterjee, Kostadin Damevski,
- Abstract summary: Toxic interactions in Open Source Software (OSS) communities reduce contributor engagement and threaten project sustainability.<n>To support more scalable approaches, we curate a dataset of 159 derailed toxic threads and 207 non-toxic threads from GitHub discussions.<n>We present a novel Large Language Model (LLM)-based framework for predicting conversational derailment on GitHub using a two-step prompting pipeline.
- Score: 9.862809084820304
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Toxic interactions in Open Source Software (OSS) communities reduce contributor engagement and threaten project sustainability. Preventing such toxicity before it emerges requires a clear understanding of how harmful conversations unfold. However, most proactive moderation strategies are manual, requiring significant time and effort from community maintainers. To support more scalable approaches, we curate a dataset of 159 derailed toxic threads and 207 non-toxic threads from GitHub discussions. Our analysis reveals that toxicity can be forecast by tension triggers, sentiment shifts, and specific conversational patterns. We present a novel Large Language Model (LLM)-based framework for predicting conversational derailment on GitHub using a two-step prompting pipeline. First, we generate \textit{Summaries of Conversation Dynamics} (SCDs) via Least-to-Most (LtM) prompting; then we use these summaries to estimate the \textit{likelihood of derailment}. Evaluated on Qwen and Llama models, our LtM strategy achieves F1-scores of 0.901 and 0.852, respectively, at a decision threshold of 0.3, outperforming established NLP baselines on conversation derailment. External validation on a dataset of 308 GitHub issue threads (65 toxic, 243 non-toxic) yields an F1-score up to 0.797. Our findings demonstrate the effectiveness of structured LLM prompting for early detection of conversational derailment in OSS, enabling proactive and explainable moderation.
Related papers
- Time Travel: LLM-Assisted Semantic Behavior Localization with Git Bisect [8.55768450285885]
We present a novel framework that integrates Large Language Models (LLMs) into the Git bisect process for semantic fault localization.<n>Our system augments bisect traversal with structured chain of thought reasoning, enabling commit by commit analysis under noisy conditions.
arXiv Detail & Related papers (2025-11-24T07:49:59Z) - One Battle After Another: Probing LLMs' Limits on Multi-Turn Instruction Following with a Benchmark Evolving Framework [51.50565654314582]
Large language models can follow users' instructions throughout a dialogue spanning multiple topics.<n>Existing benchmarks are often limited to a fixed number of turns, making them susceptible to saturation and failing to account for the user's interactive experience.<n>We propose a framework for assessing multi-turn instruction-following ability.
arXiv Detail & Related papers (2025-11-05T14:39:59Z) - Thinking Before You Speak: A Proactive Test-time Scaling Approach [54.8205006555199]
We implement our idea as a reasoning framework, named emphThinking Before You Speak (TBYS)<n>We design a pipeline for automatically collecting and filtering in-context examples for the generation of emphinsights.<n>Experiments on challenging mathematical datasets verify the effectiveness of TBYS.
arXiv Detail & Related papers (2025-08-26T03:43:32Z) - Revisiting Backdoor Attacks on LLMs: A Stealthy and Practical Poisoning Framework via Harmless Inputs [54.90315421117162]
We propose a novel poisoning method via completely harmless data.<n>Inspired by the causal reasoning in auto-regressive LLMs, we aim to establish robust associations between triggers and an affirmative response prefix.<n>We observe an interesting resistance phenomenon where the LLM initially appears to agree but subsequently refuses to answer.
arXiv Detail & Related papers (2025-05-23T08:13:59Z) - Understanding and Predicting Derailment in Toxic Conversations on GitHub [6.343946534579351]
This study aims to understand and predict conversational derailment leading to toxicity on GitHub.<n>Based on this dataset, we identify unique characteristics of toxic conversations and derailment points.<n>We propose a proactive moderation approach to automatically detect and address potentially harmful conversations before escalation.
arXiv Detail & Related papers (2025-03-04T02:01:37Z) - Toxic Subword Pruning for Dialogue Response Generation on Large Language Models [51.713448010799986]
We propose textbfToxic Subword textbfPruning (ToxPrune) to prune the subword contained by the toxic words from BPE in trained LLMs.
ToxPrune simultaneously improves the toxic language model NSFW-3B on the task of dialogue response generation obviously.
arXiv Detail & Related papers (2024-10-05T13:30:33Z) - Exploring ChatGPT for Toxicity Detection in GitHub [5.003898791753481]
The prevalence of negative discourse, often manifested as toxic comments, poses significant challenges to developer well-being and productivity.
To identify such negativity in project communications, automated toxicity detection models are necessary.
To train these models effectively, we need large software engineering-specific toxicity datasets.
arXiv Detail & Related papers (2023-12-20T15:23:00Z) - ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large
Language Models [125.7209927536255]
We propose ChatCoT, a tool-augmented chain-of-thought reasoning framework for chat-based LLMs.
In ChatCoT, we model the chain-of-thought (CoT) reasoning as multi-turn conversations, to utilize tools in a more natural way through chatting.
Our approach can effectively leverage the multi-turn conversation ability of chat-based LLMs, and integrate the thought chain following and tools manipulation in a unified way.
arXiv Detail & Related papers (2023-05-23T17:54:33Z) - Conversation Modeling to Predict Derailment [15.45515784064555]
The ability to predict whether ongoing conversations are likely to derail could provide valuable real-time insight to interlocutors and moderators.
Some works attempt to make dynamic prediction as the conversation develops, but fail to incorporate multisource information, such as conversation structure and distance to derailment.
We propose a hierarchical transformer-based framework that combines utterance-level and conversation-level information to capture fine-grained contextual semantics.
arXiv Detail & Related papers (2023-03-20T15:10:45Z) - Automated Identification of Toxic Code Reviews: How Far Can We Go? [7.655225472610752]
ToxiCR is a supervised learning-based toxicity identification tool for code review interactions.
ToxiCR significantly outperforms existing toxicity detectors on our dataset.
arXiv Detail & Related papers (2022-02-26T04:27:39Z) - CIL: Contrastive Instance Learning Framework for Distantly Supervised
Relation Extraction [52.94486705393062]
We go beyond typical multi-instance learning (MIL) framework and propose a novel contrastive instance learning (CIL) framework.
Specifically, we regard the initial MIL as the relational triple encoder and constraint positive pairs against negative pairs for each instance.
Experiments demonstrate the effectiveness of our proposed framework, with significant improvements over the previous methods on NYT10, GDS and KBP.
arXiv Detail & Related papers (2021-06-21T04:51:59Z) - RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language
Models [93.151822563361]
Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language which hinders their safe deployment.
We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration.
arXiv Detail & Related papers (2020-09-24T03:17:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.