Investigating Bias In Automatic Toxic Comment Detection: An Empirical
Study
- URL: http://arxiv.org/abs/2108.06487v1
- Date: Sat, 14 Aug 2021 08:24:13 GMT
- Title: Investigating Bias In Automatic Toxic Comment Detection: An Empirical
Study
- Authors: Ayush Kumar, Pratik Kumar
- Abstract summary: With surge in online platforms, there has been an upsurge in the user engagement on these platforms via comments and reactions.
A large portion of such textual comments are abusive, rude and offensive to the audience.
With machine learning systems in-place to check such comments coming onto platform, biases present in the training data gets passed onto the classifier leading to discrimination against a set of classes, religion and gender.
- Score: 1.5609988622100528
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With surge in online platforms, there has been an upsurge in the user
engagement on these platforms via comments and reactions. A large portion of
such textual comments are abusive, rude and offensive to the audience. With
machine learning systems in-place to check such comments coming onto platform,
biases present in the training data gets passed onto the classifier leading to
discrimination against a set of classes, religion and gender. In this work, we
evaluate different classifiers and feature to estimate the bias in these
classifiers along with their performance on downstream task of toxicity
classification. Results show that improvement in performance of automatic toxic
comment detection models is positively correlated to mitigating biases in these
models. In our work, LSTM with attention mechanism proved to be a better
modelling strategy than a CNN model. Further analysis shows that fasttext
embeddings is marginally preferable than glove embeddings on training models
for toxicity comment detection. Deeper analysis reveals the findings that such
automatic models are particularly biased to specific identity groups even
though the model has a high AUC score. Finally, in effort to mitigate bias in
toxicity detection models, a multi-task setup trained with auxiliary task of
toxicity sub-types proved to be useful leading to upto 0.26% (6% relative) gain
in AUC scores.
Related papers
- Determination of toxic comments and unintended model bias minimization
using Deep learning approach [0.0]
In this research, our aim is to detect toxic comment and reduce the unintended bias concerning identity features such as race, gender, sex, religion by fine-tuning an attention based model called BERT(Bidirectional Representation from Transformers)
We apply weighted loss to address the issue of unbalanced data and compare the performance of a fine-tuned BERT model with a traditional Logistic Regression model in terms of classification and bias minimization.
arXiv Detail & Related papers (2023-11-08T16:10:28Z) - Towards Poisoning Fair Representations [26.47681999979761]
This work proposes the first data poisoning framework attacking fair representation learning methods.
We induce the model to output unfair representations that contain as much demographic information as possible by injecting carefully crafted poisoning samples into the training data.
Experiments on benchmark fairness datasets and state-of-the-art fair representation learning models demonstrate the superiority of our attack.
arXiv Detail & Related papers (2023-09-28T14:51:20Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Cyberbullying Classifiers are Sensitive to Model-Agnostic Perturbations [15.152559543181523]
This study is the first to investigate the effect of adversarial behavior and augmentation for cyberbullying detection.
We demonstrate that model-agnostic lexical substitutions significantly hurt performance.
Augmentations proposed in prior work on toxicity prove to be less effective.
arXiv Detail & Related papers (2022-01-17T12:48:27Z) - Explain, Edit, and Understand: Rethinking User Study Design for
Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews.
We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z) - AES Systems Are Both Overstable And Oversensitive: Explaining Why And
Proposing Defenses [66.49753193098356]
We investigate the reason behind the surprising adversarial brittleness of scoring models.
Our results indicate that autoscoring models, despite getting trained as "end-to-end" models, behave like bag-of-words models.
We propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies.
arXiv Detail & Related papers (2021-09-24T03:49:38Z) - LOGAN: Local Group Bias Detection by Clustering [86.38331353310114]
We argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model.
We propose LOGAN, a new bias detection technique based on clustering.
Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region.
arXiv Detail & Related papers (2020-10-06T16:42:51Z) - Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle.
In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize.
Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z) - Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring
Systems [64.4896118325552]
We evaluate the current state-of-the-art AES models using a model adversarial evaluation scheme and associated metrics.
We find that AES models are highly overstable. Even heavy modifications(as much as 25%) with content unrelated to the topic of the questions do not decrease the score produced by the models.
arXiv Detail & Related papers (2020-07-14T03:49:43Z) - Debiasing Skin Lesion Datasets and Models? Not So Fast [17.668005682385175]
Models learned from data risk learning biases from that same data.
When models learn spurious correlations not found in real-world situations, their deployment for critical tasks, such as medical decisions, can be catastrophic.
We find out that, despite interesting results that point to promising future research, current debiasing methods are not ready to solve the bias issue for skin-lesion models.
arXiv Detail & Related papers (2020-04-23T21:07:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.