Power of Explanations: Towards automatic debiasing in hate speech
detection
- URL: http://arxiv.org/abs/2209.09975v1
- Date: Wed, 7 Sep 2022 14:14:03 GMT
- Title: Power of Explanations: Towards automatic debiasing in hate speech
detection
- Authors: Yi Cai, Arthur Zimek, Gerhard Wunder, Eirini Ntoutsi
- Abstract summary: Hate speech detection is a common downstream application of natural language processing (NLP) in the real world.
We propose an automatic misuse detector (MiD) relying on an explanation method for detecting potential bias.
- Score: 19.26084350822197
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hate speech detection is a common downstream application of natural language
processing (NLP) in the real world. In spite of the increasing accuracy,
current data-driven approaches could easily learn biases from the imbalanced
data distributions originating from humans. The deployment of biased models
could further enhance the existing social biases. But unlike handling tabular
data, defining and mitigating biases in text classifiers, which deal with
unstructured data, are more challenging. A popular solution for improving
machine learning fairness in NLP is to conduct the debiasing process with a
list of potentially discriminated words given by human annotators. In addition
to suffering from the risks of overlooking the biased terms, exhaustively
identifying bias with human annotators are unsustainable since discrimination
is variable among different datasets and may evolve over time. To this end, we
propose an automatic misuse detector (MiD) relying on an explanation method for
detecting potential bias. And built upon that, an end-to-end debiasing
framework with the proposed staged correction is designed for text classifiers
without any external resources required.
Related papers
- Editable Fairness: Fine-Grained Bias Mitigation in Language Models [52.66450426729818]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.
FAST surpasses state-of-the-art baselines with superior debiasing performance.
This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z) - HateDebias: On the Diversity and Variability of Hate Speech Debiasing [14.225997610785354]
We propose a benchmark, named HateDebias, to analyze the model ability of hate speech detection under continuous, changing environments.
Specifically, to meet the diversity of biases, we collect existing hate speech detection datasets with different types of biases.
We evaluate the detection accuracy of models trained on the datasets with a single type of bias with the performance on the HateDebias, where a significant performance drop is observed.
arXiv Detail & Related papers (2024-06-07T12:18:02Z) - Language-guided Detection and Mitigation of Unknown Dataset Bias [23.299264313976213]
We propose a framework to identify potential biases as keywords without prior knowledge based on the partial occurrence in the captions.
Our framework not only outperforms existing methods without prior knowledge, but also is even comparable with a method that assumes prior knowledge.
arXiv Detail & Related papers (2024-06-05T03:11:33Z) - NBIAS: A Natural Language Processing Framework for Bias Identification
in Text [9.486702261615166]
Bias in textual data can lead to skewed interpretations and outcomes when the data is used.
An algorithm trained on biased data may end up making decisions that disproportionately impact a certain group of people.
We develop a comprehensive framework NBIAS that consists of four main layers: data, corpus construction, model development and an evaluation layer.
arXiv Detail & Related papers (2023-08-03T10:48:30Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - Automatically Identifying Semantic Bias in Crowdsourced Natural Language
Inference Datasets [78.6856732729301]
We introduce a model-driven, unsupervised technique to find "bias clusters" in a learned embedding space of hypotheses in NLI datasets.
interventions and additional rounds of labeling can be performed to ameliorate the semantic bias of the hypothesis distribution of a dataset.
arXiv Detail & Related papers (2021-12-16T22:49:01Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - Improving Robustness by Augmenting Training Sentences with
Predicate-Argument Structures [62.562760228942054]
Existing approaches to improve robustness against dataset biases mostly focus on changing the training objective.
We propose to augment the input sentences in the training data with their corresponding predicate-argument structures.
We show that without targeting a specific bias, our sentence augmentation improves the robustness of transformer models against multiple biases.
arXiv Detail & Related papers (2020-10-23T16:22:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.