Related papers: Handling Bias in Toxic Speech Detection: A Survey

Handling Bias in Toxic Speech Detection: A Survey

URL: http://arxiv.org/abs/2202.00126v3
Date: Sun, 15 Jan 2023 14:51:55 GMT
Title: Handling Bias in Toxic Speech Detection: A Survey
Authors: Tanmay Garg, Sarah Masud, Tharun Suresh, Tanmoy Chakraborty
Abstract summary: We look at proposed methods for evaluating and mitigating bias in toxic speech detection. Case study introduces the concept of bias shift due to knowledge-based bias mitigation. Survey concludes with an overview of the critical challenges, research gaps, and future directions.
Score: 26.176340438312376
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Detecting online toxicity has always been a challenge due to its inherent subjectivity. Factors such as the context, geography, socio-political climate, and background of the producers and consumers of the posts play a crucial role in determining if the content can be flagged as toxic. Adoption of automated toxicity detection models in production can thus lead to a sidelining of the various groups they aim to help in the first place. It has piqued researchers' interest in examining unintended biases and their mitigation. Due to the nascent and multi-faceted nature of the work, complete literature is chaotic in its terminologies, techniques, and findings. In this paper, we put together a systematic study of the limitations and challenges of existing methods for mitigating bias in toxicity detection. We look closely at proposed methods for evaluating and mitigating bias in toxic speech detection. To examine the limitations of existing methods, we also conduct a case study to introduce the concept of bias shift due to knowledge-based bias mitigation. The survey concludes with an overview of the critical challenges, research gaps, and future directions. While reducing toxicity on online platforms continues to be an active area of research, a systematic study of various biases and their mitigation strategies will help the research community produce robust and fair models.

Related papers

Redefining Toxicity: An Objective and Context-Aware Approach for Stress-Level-Based Detection [1.9424018922013224]
This study introduces a novel, objective, and context-aware framework for toxicity detection. We propose new definition, metric and training approach as a parts of our framework and demonstrate it's effectiveness.
arXiv Detail & Related papers (2025-03-20T12:09:01Z)
Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage. Models may behave unreliably due to poorly explored failure modes. causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z)
How Toxic Can You Get? Search-based Toxicity Testing for Large Language Models [0.5597620745943381]
Large Language Models (LLMs) can cause extensive harm when prone to generating toxic responses. We present EvoTox, an automated testing framework for LLMs' inclination to toxicity. We conduct a quantitative and qualitative empirical evaluation using four state-of-the-art LLMs.
arXiv Detail & Related papers (2025-01-03T10:08:49Z)
Bias in Large Language Models: Origin, Evaluation, and Mitigation [4.606140332500086]
Large Language Models (LLMs) have revolutionized natural language processing, but their susceptibility to biases poses significant challenges. This comprehensive review examines the landscape of bias in LLMs, from its origins to current mitigation strategies. Ethical and legal implications of biased LLMs are discussed, emphasizing potential harms in real-world applications such as healthcare and criminal justice.
arXiv Detail & Related papers (2024-11-16T23:54:53Z)
A Survey of Stance Detection on Social Media: New Directions and Perspectives [50.27382951812502]
stance detection has emerged as a crucial subfield within affective computing. Recent years have seen a surge of research interest in developing effective stance detection methods. This paper provides a comprehensive survey of stance detection techniques on social media.
arXiv Detail & Related papers (2024-09-24T03:06:25Z)
Seeing Unseen: Discover Novel Biomedical Concepts via Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues. We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space. A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z)
A Taxonomy of Rater Disagreements: Surveying Challenges & Opportunities from the Perspective of Annotating Online Toxicity [15.23055494327071]
Toxicity is an increasingly common and severe issue in online spaces. A rich line of machine learning research has focused on computationally detecting and mitigating online toxicity. Recent research has pointed out the importance of accounting for the subjective nature of this task.
arXiv Detail & Related papers (2023-11-07T21:00:51Z)
Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs) We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing. We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z)
On Bias and Fairness in NLP: Investigating the Impact of Bias and Debiasing in Language Models on the Fairness of Toxicity Detection [7.297345802761503]
representation bias, selection bias and overamplification bias are investigated. We show that overamplification bias is the most impactful type of bias on the fairness of the task of toxicity detection. We introduce a list of guidelines to ensure the fairness of the task of toxicity detection.
arXiv Detail & Related papers (2023-05-22T08:44:00Z)
Toxicity Detection with Generative Prompt-based Inference [3.9741109244650823]
It is a long-known risk that language models (LMs), once trained on corpus containing undesirable content, have the power to manifest biases and toxicity. In this work, we explore the generative variant of zero-shot prompt-based toxicity detection with comprehensive trials on prompt engineering.
arXiv Detail & Related papers (2022-05-24T22:44:43Z)
Anatomizing Bias in Facial Analysis [86.79402670904338]
Existing facial analysis systems have been shown to yield biased results against certain demographic subgroups. It has become imperative to ensure that these systems do not discriminate based on gender, identity, or skin tone of individuals. This has led to research in the identification and mitigation of bias in AI systems.
arXiv Detail & Related papers (2021-12-13T09:51:13Z)
Mitigating Biases in Toxic Language Detection through Invariant Rationalization [70.36701068616367]
biases toward some attributes, including gender, race, and dialect, exist in most training datasets for toxicity detection. We propose to use invariant rationalization (InvRat), a game-theoretic framework consisting of a rationale generator and a predictor, to rule out the spurious correlation of certain syntactic patterns. Our method yields lower false positive rate in both lexical and dialectal attributes than previous debiasing methods.
arXiv Detail & Related papers (2021-06-14T08:49:52Z)
Challenges in Automated Debiasing for Toxic Language Detection [81.04406231100323]
Biased associations have been a challenge in the development of classifiers for detecting toxic language. We investigate recently introduced debiasing methods for text classification datasets and models, as applied to toxic language detection. Our focus is on lexical (e.g., swear words, slurs, identity mentions) and dialectal markers (specifically African American English)
arXiv Detail & Related papers (2021-01-29T22:03:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.