Reducing Target Group Bias in Hate Speech Detectors
- URL: http://arxiv.org/abs/2112.03858v1
- Date: Tue, 7 Dec 2021 17:49:34 GMT
- Title: Reducing Target Group Bias in Hate Speech Detectors
- Authors: Darsh J Shah, Sinong Wang, Han Fang, Hao Ma and Luke Zettlemoyer
- Abstract summary: We show that text classification models trained on large publicly available datasets, may significantly under-perform on several protected groups.
We propose to perform token-level hate sense disambiguation, and utilize tokens' hate sense representations for detection.
- Score: 56.94616390740415
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ubiquity of offensive and hateful content on online fora necessitates the
need for automatic solutions that detect such content competently across target
groups. In this paper we show that text classification models trained on large
publicly available datasets despite having a high overall performance, may
significantly under-perform on several protected groups. On the
\citet{vidgen2020learning} dataset, we find the accuracy to be 37\% lower on an
under annotated Black Women target group and 12\% lower on Immigrants, where
hate speech involves a distinct style. To address this, we propose to perform
token-level hate sense disambiguation, and utilize tokens' hate sense
representations for detection, modeling more general signals. On two publicly
available datasets, we observe that the variance in model accuracy across
target groups drops by at least 30\%, improving the average target group
performance by 4\% and worst case performance by 13\%.
Related papers
- THOS: A Benchmark Dataset for Targeted Hate and Offensive Speech [2.7061497863588126]
THOS is a dataset of 8.3k tweets manually labeled with fine-grained annotations about the target of the message.
We demonstrate that this dataset makes it feasible to train classifiers, based on Large Language Models, to perform classification at this level of granularity.
arXiv Detail & Related papers (2023-11-11T00:30:31Z) - On the Challenges of Building Datasets for Hate Speech Detection [0.0]
We first analyze the issues surrounding hate speech detection through a data-centric lens.
We then outline a holistic framework to encapsulate the data creation pipeline across seven broad dimensions.
arXiv Detail & Related papers (2023-09-06T11:15:47Z) - When the Majority is Wrong: Modeling Annotator Disagreement for Subjective Tasks [45.14664901245331]
A crucial problem in hate speech detection is determining whether a statement is offensive to a demographic group.
We construct a model that predicts individual annotator ratings on potentially offensive text.
We find that annotator ratings can be predicted using their demographic information and opinions on online content.
arXiv Detail & Related papers (2023-05-11T07:55:20Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Towards Group Robustness in the presence of Partial Group Labels [61.33713547766866]
spurious correlations between input samples and the target labels wrongly direct the neural network predictions.
We propose an algorithm that optimize for the worst-off group assignments from a constraint set.
We show improvements in the minority group's performance while preserving overall aggregate accuracy across groups.
arXiv Detail & Related papers (2022-01-10T22:04:48Z) - Statistical Analysis of Perspective Scores on Hate Speech Detection [7.447951461558536]
State-of-the-art hate speech classifiers are efficient only when tested on the data with the same feature distribution as training data.
In such a diverse data distribution relying on low level features is the main cause of deficiency due to natural bias in data.
We show that, different hate speech datasets are very similar when it comes to extract their Perspective Scores.
arXiv Detail & Related papers (2021-06-22T17:17:35Z) - Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models.
We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups.
We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results.
We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z) - Selective Classification Can Magnify Disparities Across Groups [89.14499988774985]
We find that while selective classification can improve average accuracies, it can simultaneously magnify existing accuracy disparities.
Increasing abstentions can even decrease accuracies on some groups.
We train distributionally-robust models that achieve similar full-coverage accuracies across groups and show that selective classification uniformly improves each group.
arXiv Detail & Related papers (2020-10-27T08:51:30Z) - Mitigating Face Recognition Bias via Group Adaptive Classifier [53.15616844833305]
This work aims to learn a fair face representation, where faces of every group could be more equally represented.
Our work is able to mitigate face recognition bias across demographic groups while maintaining the competitive accuracy.
arXiv Detail & Related papers (2020-06-13T06:43:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.