Related papers: ToxiSpanSE: An Explainable Toxicity Detection in Code Review Comments

ToxiSpanSE: An Explainable Toxicity Detection in Code Review Comments

URL: http://arxiv.org/abs/2307.03386v1
Date: Fri, 7 Jul 2023 04:55:11 GMT
Title: ToxiSpanSE: An Explainable Toxicity Detection in Code Review Comments
Authors: Jaydeb Saker and Sayma Sultana and Steven R. Wilson and Amiangshu Bosu
Abstract summary: ToxiSpanSE is the first tool to detect toxic spans in the Software Engineering (SE) domain. Our model achieved the best score with 0.88 $F1$, 0.87 precision, and 0.93 recall for toxic class tokens.
Score: 4.949881799107062
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Background: The existence of toxic conversations in open-source platforms can degrade relationships among software developers and may negatively impact software product quality. To help mitigate this, some initial work has been done to detect toxic comments in the Software Engineering (SE) domain. Aims: Since automatically classifying an entire text as toxic or non-toxic does not help human moderators to understand the specific reason(s) for toxicity, we worked to develop an explainable toxicity detector for the SE domain. Method: Our explainable toxicity detector can detect specific spans of toxic content from SE texts, which can help human moderators by automatically highlighting those spans. This toxic span detection model, ToxiSpanSE, is trained with the 19,651 code review (CR) comments with labeled toxic spans. Our annotators labeled the toxic spans within 3,757 toxic CR samples. We explored several types of models, including one lexicon-based approach and five different transformer-based encoders. Results: After an extensive evaluation of all models, we found that our fine-tuned RoBERTa model achieved the best score with 0.88 $F1$, 0.87 precision, and 0.93 recall for toxic class tokens, providing an explainable toxicity classifier for the SE domain. Conclusion: Since ToxiSpanSE is the first tool to detect toxic spans in the SE domain, this tool will pave a path to combat toxicity in the SE community.

Related papers

Analyzing Toxicity in Open Source Software Communications Using Psycholinguistics and Moral Foundations Theory [5.03553492616371]
This paper investigates a machine learning-based approach for the automatic detection of toxic communications in Open Source Software (OSS) We leverage psycholinguistic lexicons, and Moral Foundations Theory to analyze toxicity in two types of OSS communication channels; issue comments and code reviews. Using moral values as features is more effective than linguistic cues, resulting in 67.50% F1-measure in identifying toxic instances in code review data and 64.83% in issue comments.
arXiv Detail & Related papers (2024-12-17T17:52:00Z)
Unveiling the Implicit Toxicity in Large Language Models [77.90933074675543]
The open-endedness of large language models (LLMs) combined with their impressive capabilities may lead to new safety issues when being exploited for malicious use. We show that LLMs can generate diverse implicit toxic outputs that are exceptionally difficult to detect via simply zero-shot prompting. We propose a reinforcement learning (RL) based attacking method to further induce the implicit toxicity in LLMs.
arXiv Detail & Related papers (2023-11-29T06:42:36Z)
Comprehensive Assessment of Toxicity in ChatGPT [49.71090497696024]
We evaluate the toxicity in ChatGPT by utilizing instruction-tuning datasets. prompts in creative writing tasks can be 2x more likely to elicit toxic responses. Certain deliberately toxic prompts, designed in earlier studies, no longer yield harmful responses.
arXiv Detail & Related papers (2023-11-03T14:37:53Z)
Facilitating Fine-grained Detection of Chinese Toxic Language: Hierarchical Taxonomy, Resources, and Benchmarks [18.44630180661091]
Existing datasets lack fine-grained annotation of toxic types and expressions. It is crucial to introduce lexical knowledge to detect the toxicity of posts. In this paper, we facilitate the fine-grained detection of Chinese toxic language.
arXiv Detail & Related papers (2023-05-08T03:50:38Z)
Detoxifying Text with MaRCo: Controllable Revision with Experts and Anti-Experts [57.38912708076231]
We introduce MaRCo, a detoxification algorithm that combines controllable generation and text rewriting methods. MaRCo uses likelihoods under a non-toxic LM and a toxic LM to find candidate words to mask and potentially replace. We evaluate our method on several subtle toxicity and microaggressions datasets, and show that it not only outperforms baselines on automatic metrics, but MaRCo's rewrites are preferred 2.1 $times$ more in human evaluation.
arXiv Detail & Related papers (2022-12-20T18:50:00Z)
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection [33.715318646717385]
ToxiGen is a large-scale dataset of 274k toxic and benign statements about 13 minority groups. Controlling machine generation in this way allows ToxiGen to cover implicitly toxic text at a larger scale. We find that 94.5% of toxic examples are labeled as hate speech by human annotators.
arXiv Detail & Related papers (2022-03-17T17:57:56Z)
Automated Identification of Toxic Code Reviews: How Far Can We Go? [7.655225472610752]
ToxiCR is a supervised learning-based toxicity identification tool for code review interactions. ToxiCR significantly outperforms existing toxicity detectors on our dataset.
arXiv Detail & Related papers (2022-02-26T04:27:39Z)
Mitigating Biases in Toxic Language Detection through Invariant Rationalization [70.36701068616367]
biases toward some attributes, including gender, race, and dialect, exist in most training datasets for toxicity detection. We propose to use invariant rationalization (InvRat), a game-theoretic framework consisting of a rationale generator and a predictor, to rule out the spurious correlation of certain syntactic patterns. Our method yields lower false positive rate in both lexical and dialectal attributes than previous debiasing methods.
arXiv Detail & Related papers (2021-06-14T08:49:52Z)
Lone Pine at SemEval-2021 Task 5: Fine-Grained Detection of Hate Speech Using BERToxic [2.4815579733050153]
This paper describes our approach to the Toxic Spans Detection problem. We propose BERToxic, a system that fine-tunes a pre-trained BERT model to locate toxic text spans in a given text. Our system significantly outperformed the provided baseline and achieved an F1-score of 0.683, placing Lone Pine in the 17th place out of 91 teams in the competition.
arXiv Detail & Related papers (2021-04-08T04:46:14Z)
Challenges in Automated Debiasing for Toxic Language Detection [81.04406231100323]
Biased associations have been a challenge in the development of classifiers for detecting toxic language. We investigate recently introduced debiasing methods for text classification datasets and models, as applied to toxic language detection. Our focus is on lexical (e.g., swear words, slurs, identity mentions) and dialectal markers (specifically African American English)
arXiv Detail & Related papers (2021-01-29T22:03:17Z)
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models [93.151822563361]
Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language which hinders their safe deployment. We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration.
arXiv Detail & Related papers (2020-09-24T03:17:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.