Improving Counterfactual Generation for Fair Hate Speech Detection
- URL: http://arxiv.org/abs/2108.01721v1
- Date: Tue, 3 Aug 2021 19:47:27 GMT
- Title: Improving Counterfactual Generation for Fair Hate Speech Detection
- Authors: Aida Mostafazadeh Davani, Ali Omrani, Brendan Kennedy, Mohammad Atari,
Xiang Ren, Morteza Dehghani
- Abstract summary: Bias mitigation approaches reduce models' dependence on sensitive features of data, such as social group tokens (SGTs)
In hate speech detection, however, equalizing model predictions may ignore important differences among targeted social groups.
Here, we rely on counterfactual fairness and equalize predictions among counterfactuals, generated by changing the SGTs.
- Score: 26.79268141793483
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bias mitigation approaches reduce models' dependence on sensitive features of
data, such as social group tokens (SGTs), resulting in equal predictions across
the sensitive features. In hate speech detection, however, equalizing model
predictions may ignore important differences among targeted social groups, as
hate speech can contain stereotypical language specific to each SGT. Here, to
take the specific language about each SGT into account, we rely on
counterfactual fairness and equalize predictions among counterfactuals,
generated by changing the SGTs. Our method evaluates the similarity in sentence
likelihoods (via pre-trained language models) among counterfactuals, to treat
SGTs equally only within interchangeable contexts. By applying logit pairing to
equalize outcomes on the restricted set of counterfactuals for each instance,
we improve fairness metrics while preserving model performance on hate speech
detection.
Related papers
- Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot
Translation [79.96416609433724]
Zero-shot translation (ZST) aims to translate between unseen language pairs in training data.
The common practice to guide the zero-shot language mapping during inference is to deliberately insert the source and target language IDs.
Recent studies have shown that language IDs sometimes fail to navigate the ZST task, making them suffer from the off-target problem.
arXiv Detail & Related papers (2023-09-28T17:02:36Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - ChatGPT as a Text Simplification Tool to Remove Bias [0.0]
The presence of specific linguistic signals particular to a certain sub-group can be picked up by language models during training.
We explore a potential technique for bias mitigation in the form of simplification of text.
arXiv Detail & Related papers (2023-05-09T13:10:23Z) - Does Manipulating Tokenization Aid Cross-Lingual Transfer? A Study on
POS Tagging for Non-Standardized Languages [18.210880703295253]
We finetune pretrained language models (PLMs) on seven languages from three different families.
We analyze their zero-shot performance on closely related, non-standardized varieties.
Overall, we find that the similarity between the percentage of words that get split into subwords in the source and target data is the strongest predictor for model performance on target data.
arXiv Detail & Related papers (2023-04-20T08:32:34Z) - Necessity and Sufficiency for Explaining Text Classifiers: A Case Study
in Hate Speech Detection [7.022948483613112]
We present a novel feature attribution method for explaining text classifiers, and analyze it in the context of hate speech detection.
We provide two complementary and theoretically-grounded scores -- necessity and sufficiency -- resulting in more informative explanations.
We employ our method to explain the predictions of different hate speech detection models on the same set of curated examples from a test suite, and show that different values of necessity and sufficiency for identity terms correspond to different kinds of false positive errors.
arXiv Detail & Related papers (2022-05-06T15:34:48Z) - Measuring Fairness of Text Classifiers via Prediction Sensitivity [63.56554964580627]
ACCUMULATED PREDICTION SENSITIVITY measures fairness in machine learning models based on the model's prediction sensitivity to perturbations in input features.
We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness.
arXiv Detail & Related papers (2022-03-16T15:00:33Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Characterizing the adversarial vulnerability of speech self-supervised
learning [95.03389072594243]
We make the first attempt to investigate the adversarial vulnerability of such paradigm under the attacks from both zero-knowledge adversaries and limited-knowledge adversaries.
The experimental results illustrate that the paradigm proposed by SUPERB is seriously vulnerable to limited-knowledge adversaries.
arXiv Detail & Related papers (2021-11-08T08:44:04Z) - Direct speech-to-speech translation with discrete units [64.19830539866072]
We present a direct speech-to-speech translation (S2ST) model that translates speech from one language to speech in another language without relying on intermediate text generation.
We propose to predict the self-supervised discrete representations learned from an unlabeled speech corpus instead.
When target text transcripts are available, we design a multitask learning framework with joint speech and text training that enables the model to generate dual mode output (speech and text) simultaneously in the same inference pass.
arXiv Detail & Related papers (2021-07-12T17:40:43Z) - Fair Hate Speech Detection through Evaluation of Social Group
Counterfactuals [21.375422346539004]
Approaches for mitigating bias in supervised models are designed to reduce models' dependence on specific sensitive features of the input data.
In the case of hate speech detection, it is not always desirable to equalize the effects of social groups.
Counterfactual token fairness for a mentioned social group evaluates the model's predictions as to whether they are the same for (a) the actual sentence and (b) a counterfactual instance.
Our approach assures robust model predictions for counterfactuals that imply similar meaning as the actual sentence.
arXiv Detail & Related papers (2020-10-24T04:51:47Z) - Learning not to Discriminate: Task Agnostic Learning for Improving
Monolingual and Code-switched Speech Recognition [12.354292498112347]
We present further improvements over our previous work by using domain adversarial learning to train task models.
Our proposed technique leads to reductions in Word Error Rates (WER) in monolingual and code-switched test sets across three language pairs.
arXiv Detail & Related papers (2020-06-09T13:45:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.