Counterfactually Augmented Data and Unintended Bias: The Case of Sexism
and Hate Speech Detection
- URL: http://arxiv.org/abs/2205.04238v1
- Date: Mon, 9 May 2022 12:39:26 GMT
- Title: Counterfactually Augmented Data and Unintended Bias: The Case of Sexism
and Hate Speech Detection
- Authors: Indira Sen, Mattia Samory, Claudia Wagner, and Isabelle Augenstein
- Abstract summary: Over-relying on core features may lead to unintended model bias.
We test models for sexism and hate speech detection on challenging data.
Using a diverse set of CAD -- construct-driven and construct-agnostic -- reduces such unintended bias.
- Score: 35.29235215101502
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Counterfactually Augmented Data (CAD) aims to improve out-of-domain
generalizability, an indicator of model robustness. The improvement is credited
with promoting core features of the construct over spurious artifacts that
happen to correlate with it. Yet, over-relying on core features may lead to
unintended model bias. Especially, construct-driven CAD -- perturbations of
core features -- may induce models to ignore the context in which core features
are used. Here, we test models for sexism and hate speech detection on
challenging data: non-hateful and non-sexist usage of identity and gendered
terms. In these hard cases, models trained on CAD, especially construct-driven
CAD, show higher false-positive rates than models trained on the original,
unperturbed data. Using a diverse set of CAD -- construct-driven and
construct-agnostic -- reduces such unintended bias.
Related papers
- Unsupervised Model Diagnosis [49.36194740479798]
This paper proposes Unsupervised Model Diagnosis (UMO) to produce semantic counterfactual explanations without any user guidance.
Our approach identifies and visualizes changes in semantics, and then matches these changes to attributes from wide-ranging text sources.
arXiv Detail & Related papers (2024-10-08T17:59:03Z) - PairCFR: Enhancing Model Training on Paired Counterfactually Augmented Data through Contrastive Learning [49.60634126342945]
Counterfactually Augmented Data (CAD) involves creating new data samples by applying minimal yet sufficient modifications to flip the label of existing data samples to other classes.
Recent research reveals that training with CAD may lead models to overly focus on modified features while ignoring other important contextual information.
We employ contrastive learning to promote global feature alignment in addition to learning counterfactual clues.
arXiv Detail & Related papers (2024-06-09T07:29:55Z) - People Make Better Edits: Measuring the Efficacy of LLM-Generated
Counterfactually Augmented Data for Harmful Language Detection [35.89913036572029]
It is imperative that NLP models are robust to spurious features.
Past work has attempted to tackle such spurious features using training data augmentation.
We assess if this task can be automated using generative NLP models.
arXiv Detail & Related papers (2023-11-02T14:31:25Z) - Unlock the Potential of Counterfactually-Augmented Data in
Out-Of-Distribution Generalization [25.36416774024584]
Counterfactually-Augmented Data (CAD) has the potential to improve the Out-Of-Distribution (OOD) generalization capability of language models.
In this study, we attribute the inefficiency to the myopia phenomenon caused by CAD.
We introduce two additional constraints based on CAD's structural properties to help language models extract more complete causal features in CAD.
arXiv Detail & Related papers (2023-10-10T14:41:38Z) - Improving the Out-Of-Distribution Generalization Capability of Language
Models: Counterfactually-Augmented Data is not Enough [19.38778317110205]
Counterfactually-Augmented Data (CAD) has the potential to improve language models' Out-Of-Distribution (OOD) generalization capability.
In this paper, we attribute the inefficiency to Myopia Phenomenon caused by CAD.
We design two additional constraints to help language models extract more complete causal features contained in CAD.
arXiv Detail & Related papers (2023-02-18T14:39:03Z) - AutoCAD: Automatically Generating Counterfactuals for Mitigating
Shortcut Learning [70.70393006697383]
We present AutoCAD, a fully automatic and task-agnostic CAD generation framework.
In this paper, we present AutoCAD, a fully automatic and task-agnostic CAD generation framework.
arXiv Detail & Related papers (2022-11-29T13:39:53Z) - General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z) - How Does Counterfactually Augmented Data Impact Models for Social
Computing Constructs? [35.29235215101502]
We investigate the benefits of counterfactually augmented data (CAD) for social NLP models by focusing on three social computing constructs -- sentiment, sexism, and hate speech.
We find that while models trained on CAD show lower in-domain performance, they generalize better out-of-domain.
arXiv Detail & Related papers (2021-09-14T23:46:39Z) - An Investigation of the (In)effectiveness of Counterfactually Augmented
Data [10.316235366821111]
We show that while counterfactually-augmented data (CAD) is effective at identifying robust features, it may prevent the model from learning unperturbed robust features.
Our results show that the lack of perturbation diversity in current CAD datasets limits its effectiveness on OOD generalization.
arXiv Detail & Related papers (2021-07-01T21:46:43Z) - Learning from others' mistakes: Avoiding dataset biases without modeling
them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task.
Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available.
We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.