Automated Identification of Toxic Code Reviews: How Far Can We Go?
- URL: http://arxiv.org/abs/2202.13056v1
- Date: Sat, 26 Feb 2022 04:27:39 GMT
- Title: Automated Identification of Toxic Code Reviews: How Far Can We Go?
- Authors: Jaydeb Sarker, Asif Kamal Turzo, Ming Dong, Amiangshu Bosu
- Abstract summary: ToxiCR is a supervised learning-based toxicity identification tool for code review interactions.
ToxiCR significantly outperforms existing toxicity detectors on our dataset.
- Score: 7.655225472610752
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Toxic conversations during software development interactions may have serious
repercussions on a Free and Open Source Software (FOSS) development project.
For example, victims of toxic conversations may become afraid to express
themselves, therefore get demotivated, and may eventually leave the project.
Automated filtering of toxic conversations may help a FOSS community to
maintain healthy interactions among its members. However, off-the-shelf
toxicity detectors perform poorly on Software Engineering (SE) dataset, such as
one curated from code review comments. To encounter this challenge, we present
ToxiCR, a supervised learning-based toxicity identification tool for code
review interactions. ToxiCR includes a choice to select one of the ten
supervised learning algorithms, an option to select text vectorization
techniques, five mandatory and three optional SE domain specific processing
steps, and a large scale labeled dataset of 19,571 code review comments. With
our rigorous evaluation of the models with various combinations of
preprocessing steps and vectorization techniques, we have identified the best
combination for our dataset that boosts 95.8% accuracy and 88.9% F1 score.
ToxiCR significantly outperforms existing toxicity detectors on our dataset. We
have released our dataset, pretrained models, evaluation results, and source
code publicly available at: https://github.com/WSU-SEAL/ToxiCR.
Related papers
- Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings [77.20838441870151]
Commit message generation is a crucial task in software engineering that is challenging to evaluate correctly.
We use an online metric - the number of edits users introduce before committing the generated messages to the VCS - to select metrics for offline experiments.
Our results indicate that edit distance exhibits the highest correlation, whereas commonly used similarity metrics such as BLEU and METEOR demonstrate low correlation.
arXiv Detail & Related papers (2024-10-15T20:32:07Z) - GPT-DETOX: An In-Context Learning-Based Paraphraser for Text Detoxification [1.8295720742100332]
We propose GPT-DETOX as a framework for prompt-based in-context learning for text detoxification using GPT-3.5 Turbo.
To generate few-shot prompts, we propose two methods: word-matching example selection (WMES) and context-matching example selection (CMES)
We take into account ensemble in-context learning (EICL) where the ensemble is shaped by base prompts from zero-shot and all few-shot settings.
arXiv Detail & Related papers (2024-04-03T20:35:36Z) - Exploring ChatGPT for Toxicity Detection in GitHub [5.003898791753481]
The prevalence of negative discourse, often manifested as toxic comments, poses significant challenges to developer well-being and productivity.
To identify such negativity in project communications, automated toxicity detection models are necessary.
To train these models effectively, we need large software engineering-specific toxicity datasets.
arXiv Detail & Related papers (2023-12-20T15:23:00Z) - ToxiSpanSE: An Explainable Toxicity Detection in Code Review Comments [4.949881799107062]
ToxiSpanSE is the first tool to detect toxic spans in the Software Engineering (SE) domain.
Our model achieved the best score with 0.88 $F1$, 0.87 precision, and 0.93 recall for toxic class tokens.
arXiv Detail & Related papers (2023-07-07T04:55:11Z) - EnDex: Evaluation of Dialogue Engagingness at Scale [30.15445159524315]
We propose EnDex, the first human-reaction based model to evaluate dialogue engagingness.
We will release code, off-the-shelf EnDex model, and a large-scale dataset upon paper publication.
arXiv Detail & Related papers (2022-10-22T06:09:43Z) - BeCAPTCHA-Type: Biometric Keystroke Data Generation for Improved Bot
Detection [63.447493500066045]
This work proposes a data driven learning model for the synthesis of keystroke biometric data.
The proposed method is compared with two statistical approaches based on Universal and User-dependent models.
Our experimental framework considers a dataset with 136 million keystroke events from 168 thousand subjects.
arXiv Detail & Related papers (2022-07-27T09:26:15Z) - PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in
Contrastive Learning [69.70602220716718]
We propose PoisonedEncoder, a data poisoning attack to contrastive learning.
In particular, an attacker injects carefully crafted poisoning inputs into the unlabeled pre-training data.
We evaluate five defenses against PoisonedEncoder, including one pre-processing, three in-processing, and one post-processing defenses.
arXiv Detail & Related papers (2022-05-13T00:15:44Z) - Spotting adversarial samples for speaker verification by neural vocoders [102.1486475058963]
We adopt neural vocoders to spot adversarial samples for automatic speaker verification (ASV)
We find that the difference between the ASV scores for the original and re-synthesize audio is a good indicator for discrimination between genuine and adversarial samples.
Our codes will be made open-source for future works to do comparison.
arXiv Detail & Related papers (2021-07-01T08:58:16Z) - Lone Pine at SemEval-2021 Task 5: Fine-Grained Detection of Hate Speech
Using BERToxic [2.4815579733050153]
This paper describes our approach to the Toxic Spans Detection problem.
We propose BERToxic, a system that fine-tunes a pre-trained BERT model to locate toxic text spans in a given text.
Our system significantly outperformed the provided baseline and achieved an F1-score of 0.683, placing Lone Pine in the 17th place out of 91 teams in the competition.
arXiv Detail & Related papers (2021-04-08T04:46:14Z) - Exploiting Unsupervised Data for Emotion Recognition in Conversations [76.01690906995286]
Emotion Recognition in Conversations (ERC) aims to predict the emotional state of speakers in conversations.
The available supervised data for the ERC task is limited.
We propose a novel approach to leverage unsupervised conversation data.
arXiv Detail & Related papers (2020-10-02T13:28:47Z) - RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language
Models [93.151822563361]
Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language which hinders their safe deployment.
We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration.
arXiv Detail & Related papers (2020-09-24T03:17:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.