Cross-geographic Bias Detection in Toxicity Modeling
- URL: http://arxiv.org/abs/2104.06999v1
- Date: Wed, 14 Apr 2021 17:32:05 GMT
- Title: Cross-geographic Bias Detection in Toxicity Modeling
- Authors: Sayan Ghosh, Dylan Baker, David Jurgens, Vinodkumar Prabhakaran
- Abstract summary: We introduce a weakly supervised method to robustly detect lexical biases in broader geocultural contexts.
We demonstrate that our method identifies salient groups of errors, and, in a follow up, demonstrate that these groupings reflect human judgments of offensive and inoffensive language in those geographic contexts.
- Score: 9.128264779870538
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Online social media platforms increasingly rely on Natural Language
Processing (NLP) techniques to detect abusive content at scale in order to
mitigate the harms it causes to their users. However, these techniques suffer
from various sampling and association biases present in training data, often
resulting in sub-par performance on content relevant to marginalized groups,
potentially furthering disproportionate harms towards them. Studies on such
biases so far have focused on only a handful of axes of disparities and
subgroups that have annotations/lexicons available. Consequently, biases
concerning non-Western contexts are largely ignored in the literature. In this
paper, we introduce a weakly supervised method to robustly detect lexical
biases in broader geocultural contexts. Through a case study on
cross-geographic toxicity detection, we demonstrate that our method identifies
salient groups of errors, and, in a follow up, demonstrate that these groupings
reflect human judgments of offensive and inoffensive language in those
geographic contexts.
Related papers
- Seeds of Stereotypes: A Large-Scale Textual Analysis of Race and Gender Associations with Diseases in Online Sources [1.8259644946867188]
The study analyzed the context in which various diseases are discussed alongside markers of race and gender.
We found that demographic terms are disproportionately associated with specific disease concepts in online texts.
We find widespread disparities in the associations of specific racial and gender terms with the 18 diseases analyzed.
arXiv Detail & Related papers (2024-05-08T13:38:56Z) - Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information [50.29934517930506]
DAFair is a novel approach to address social bias in language models.
We leverage prototypical demographic texts and incorporate a regularization term during the fine-tuning process to mitigate bias.
arXiv Detail & Related papers (2024-03-14T15:58:36Z) - Social Bias Probing: Fairness Benchmarking for Language Models [38.180696489079985]
This paper proposes a novel framework for probing language models for social biases by assessing disparate treatment.
We curate SoFa, a large-scale benchmark designed to address the limitations of existing fairness collections.
We show that biases within language models are more nuanced than acknowledged, indicating a broader scope of encoded biases than previously recognized.
arXiv Detail & Related papers (2023-11-15T16:35:59Z) - On Bias and Fairness in NLP: Investigating the Impact of Bias and Debiasing in Language Models on the Fairness of Toxicity Detection [7.297345802761503]
representation bias, selection bias and overamplification bias are investigated.
We show that overamplification bias is the most impactful type of bias on the fairness of the task of toxicity detection.
We introduce a list of guidelines to ensure the fairness of the task of toxicity detection.
arXiv Detail & Related papers (2023-05-22T08:44:00Z) - Counter-GAP: Counterfactual Bias Evaluation through Gendered Ambiguous
Pronouns [53.62845317039185]
Bias-measuring datasets play a critical role in detecting biased behavior of language models.
We propose a novel method to collect diverse, natural, and minimally distant text pairs via counterfactual generation.
We show that four pre-trained language models are significantly more inconsistent across different gender groups than within each group.
arXiv Detail & Related papers (2023-02-11T12:11:03Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z) - Mitigating Biases in Toxic Language Detection through Invariant
Rationalization [70.36701068616367]
biases toward some attributes, including gender, race, and dialect, exist in most training datasets for toxicity detection.
We propose to use invariant rationalization (InvRat), a game-theoretic framework consisting of a rationale generator and a predictor, to rule out the spurious correlation of certain syntactic patterns.
Our method yields lower false positive rate in both lexical and dialectal attributes than previous debiasing methods.
arXiv Detail & Related papers (2021-06-14T08:49:52Z) - Challenges in Automated Debiasing for Toxic Language Detection [81.04406231100323]
Biased associations have been a challenge in the development of classifiers for detecting toxic language.
We investigate recently introduced debiasing methods for text classification datasets and models, as applied to toxic language detection.
Our focus is on lexical (e.g., swear words, slurs, identity mentions) and dialectal markers (specifically African American English)
arXiv Detail & Related papers (2021-01-29T22:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.