Exploring Hate Speech Detection with HateXplain and BERT
- URL: http://arxiv.org/abs/2208.04489v1
- Date: Tue, 9 Aug 2022 01:32:44 GMT
- Title: Exploring Hate Speech Detection with HateXplain and BERT
- Authors: Arvind Subramaniam, Aryan Mehra and Sayani Kundu
- Abstract summary: Hate Speech takes many forms to target communities with derogatory comments, and takes humanity a step back in societal progress.
HateXplain is a recently published and first dataset to use annotated spans in the form of rationales, along with speech classification categories and targeted communities.
We tune BERT to perform this task in the form of rationales and class prediction, and compare our performance on different metrics spanning across accuracy, explainability and bias.
- Score: 2.673732496490253
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hate Speech takes many forms to target communities with derogatory comments,
and takes humanity a step back in societal progress. HateXplain is a recently
published and first dataset to use annotated spans in the form of rationales,
along with speech classification categories and targeted communities to make
the classification more humanlike, explainable, accurate and less biased. We
tune BERT to perform this task in the form of rationales and class prediction,
and compare our performance on different metrics spanning across accuracy,
explainability and bias. Our novelty is threefold. Firstly, we experiment with
the amalgamated rationale class loss with different importance values.
Secondly, we experiment extensively with the ground truth attention values for
the rationales. With the introduction of conservative and lenient attentions,
we compare performance of the model on HateXplain and test our hypothesis.
Thirdly, in order to improve the unintended bias in our models, we use masking
of the target community words and note the improvement in bias and
explainability metrics. Overall, we are successful in achieving model
explanability, bias removal and several incremental improvements on the
original BERT implementation.
Related papers
- The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models [78.69526166193236]
Pre-trained Language models (PLMs) have been acknowledged to contain harmful information, such as social biases.
We propose sc Social Bias Neurons to accurately pinpoint units (i.e., neurons) in a language model that can be attributed to undesirable behavior, such as social bias.
As measured by prior metrics from StereoSet, our model achieves a higher degree of fairness while maintaining language modeling ability with low cost.
arXiv Detail & Related papers (2024-06-14T15:41:06Z) - HateDebias: On the Diversity and Variability of Hate Speech Debiasing [14.225997610785354]
We propose a benchmark, named HateDebias, to analyze the model ability of hate speech detection under continuous, changing environments.
Specifically, to meet the diversity of biases, we collect existing hate speech detection datasets with different types of biases.
We evaluate the detection accuracy of models trained on the datasets with a single type of bias with the performance on the HateDebias, where a significant performance drop is observed.
arXiv Detail & Related papers (2024-06-07T12:18:02Z) - HateRephrase: Zero- and Few-Shot Reduction of Hate Intensity in Online
Posts using Large Language Models [4.9711707739781215]
This paper investigates an approach of suggesting a rephrasing of potential hate speech content even before the post is made.
We develop 4 different prompts based on task description, hate definition, few-shot demonstrations and chain-of-thoughts.
We find that GPT-3.5 outperforms the baseline and open-source models for all the different kinds of prompts.
arXiv Detail & Related papers (2023-10-21T12:18:29Z) - When are ensembles really effective? [49.37269057899679]
We study the question of when ensembling yields significant performance improvements in classification tasks.
We show that ensembling improves performance significantly whenever the disagreement rate is large relative to the average error rate.
We identify practical scenarios where ensembling does and does not result in large performance improvements.
arXiv Detail & Related papers (2023-05-21T01:36:25Z) - ToKen: Task Decomposition and Knowledge Infusion for Few-Shot Hate
Speech Detection [85.68684067031909]
We frame this problem as a few-shot learning task, and show significant gains with decomposing the task into its "constituent" parts.
In addition, we see that infusing knowledge from reasoning datasets (e.g. Atomic 2020) improves the performance even further.
arXiv Detail & Related papers (2022-05-25T05:10:08Z) - The SAME score: Improved cosine based bias score for word embeddings [63.24247894974291]
We provide a bias definition based on the ideas from the literature and derive novel requirements for bias scores.
We propose a new bias score, SAME, to address the shortcomings of existing bias scores and show empirically that SAME is better suited to quantify biases in word embeddings.
arXiv Detail & Related papers (2022-03-28T09:28:13Z) - AngryBERT: Joint Learning Target and Emotion for Hate Speech Detection [5.649040805759824]
This paper proposes a novel multitask learning-based model, AngryBERT, which jointly learns hate speech detection with sentiment classification and target identification as secondary relevant tasks.
Experiment results show that AngryBERT outperforms state-of-the-art single-task-learning and multitask learning baselines.
arXiv Detail & Related papers (2021-03-14T16:17:26Z) - HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection [27.05719607624675]
We introduce HateXplain, the first benchmark hate speech dataset covering multiple aspects of the issue.
Each post in our dataset is annotated from three different perspectives.
We observe that models, which utilize the human rationales for training, perform better in reducing unintended bias towards target communities.
arXiv Detail & Related papers (2020-12-18T15:12:14Z) - Improving Robustness by Augmenting Training Sentences with
Predicate-Argument Structures [62.562760228942054]
Existing approaches to improve robustness against dataset biases mostly focus on changing the training objective.
We propose to augment the input sentences in the training data with their corresponding predicate-argument structures.
We show that without targeting a specific bias, our sentence augmentation improves the robustness of transformer models against multiple biases.
arXiv Detail & Related papers (2020-10-23T16:22:05Z) - Constructing interval variables via faceted Rasch measurement and
multitask deep learning: a hate speech application [63.10266319378212]
We propose a method for measuring complex variables on a continuous, interval spectrum by combining supervised deep learning with the Constructing Measures approach to faceted Rasch item response theory (IRT)
We demonstrate this new method on a dataset of 50,000 social media comments sourced from YouTube, Twitter, and Reddit and labeled by 11,000 U.S.-based Amazon Mechanical Turk workers.
arXiv Detail & Related papers (2020-09-22T02:15:05Z) - Stereotypical Bias Removal for Hate Speech Detection Task using
Knowledge-based Generalizations [16.304516254043865]
We study bias mitigation from unstructured text data for hate speech detection.
We propose novel methods leveraging knowledge-based generalizations for bias-free learning.
Our experiments with two real-world datasets, a Wikipedia Talk Pages dataset and a Twitter dataset, show that the use of knowledge-based generalizations results in better performance.
arXiv Detail & Related papers (2020-01-15T18:17:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.