Exploring Hate Speech Detection with HateXplain and BERT
- URL: http://arxiv.org/abs/2208.04489v1
- Date: Tue, 9 Aug 2022 01:32:44 GMT
- Title: Exploring Hate Speech Detection with HateXplain and BERT
- Authors: Arvind Subramaniam, Aryan Mehra and Sayani Kundu
- Abstract summary: Hate Speech takes many forms to target communities with derogatory comments, and takes humanity a step back in societal progress.
HateXplain is a recently published and first dataset to use annotated spans in the form of rationales, along with speech classification categories and targeted communities.
We tune BERT to perform this task in the form of rationales and class prediction, and compare our performance on different metrics spanning across accuracy, explainability and bias.
- Score: 2.673732496490253
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hate Speech takes many forms to target communities with derogatory comments,
and takes humanity a step back in societal progress. HateXplain is a recently
published and first dataset to use annotated spans in the form of rationales,
along with speech classification categories and targeted communities to make
the classification more humanlike, explainable, accurate and less biased. We
tune BERT to perform this task in the form of rationales and class prediction,
and compare our performance on different metrics spanning across accuracy,
explainability and bias. Our novelty is threefold. Firstly, we experiment with
the amalgamated rationale class loss with different importance values.
Secondly, we experiment extensively with the ground truth attention values for
the rationales. With the introduction of conservative and lenient attentions,
we compare performance of the model on HateXplain and test our hypothesis.
Thirdly, in order to improve the unintended bias in our models, we use masking
of the target community words and note the improvement in bias and
explainability metrics. Overall, we are successful in achieving model
explanability, bias removal and several incremental improvements on the
original BERT implementation.
Related papers
- Human and LLM Biases in Hate Speech Annotations: A Socio-Demographic Analysis of Annotators and Targets [0.6918368994425961]
We leverage an extensive dataset with rich socio-demographic information of both annotators and targets.
Our analysis surfaces the presence of widespread biases, which we quantitatively describe and characterize based on their intensity and prevalence.
Our work offers new and nuanced results on human biases in hate speech annotations, as well as fresh insights into the design of AI-driven hate speech detection systems.
arXiv Detail & Related papers (2024-10-10T14:48:57Z) - Causal Micro-Narratives [62.47217054314046]
We present a novel approach to classify causal micro-narratives from text.
These narratives are sentence-level explanations of the cause(s) and/or effect(s) of a target subject.
arXiv Detail & Related papers (2024-10-07T17:55:10Z) - The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models [78.69526166193236]
Pre-trained Language models (PLMs) have been acknowledged to contain harmful information, such as social biases.
We propose sc Social Bias Neurons to accurately pinpoint units (i.e., neurons) in a language model that can be attributed to undesirable behavior, such as social bias.
As measured by prior metrics from StereoSet, our model achieves a higher degree of fairness while maintaining language modeling ability with low cost.
arXiv Detail & Related papers (2024-06-14T15:41:06Z) - HateDebias: On the Diversity and Variability of Hate Speech Debiasing [14.225997610785354]
We propose a benchmark, named HateDebias, to analyze the model ability of hate speech detection under continuous, changing environments.
Specifically, to meet the diversity of biases, we collect existing hate speech detection datasets with different types of biases.
We evaluate the detection accuracy of models trained on the datasets with a single type of bias with the performance on the HateDebias, where a significant performance drop is observed.
arXiv Detail & Related papers (2024-06-07T12:18:02Z) - Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models.
We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - ToKen: Task Decomposition and Knowledge Infusion for Few-Shot Hate
Speech Detection [85.68684067031909]
We frame this problem as a few-shot learning task, and show significant gains with decomposing the task into its "constituent" parts.
In addition, we see that infusing knowledge from reasoning datasets (e.g. Atomic 2020) improves the performance even further.
arXiv Detail & Related papers (2022-05-25T05:10:08Z) - The SAME score: Improved cosine based bias score for word embeddings [49.75878234192369]
We introduce SAME, a novel bias score for semantic bias in embeddings.
We show that SAME is capable of measuring semantic bias and identify potential causes for social bias in downstream tasks.
arXiv Detail & Related papers (2022-03-28T09:28:13Z) - AngryBERT: Joint Learning Target and Emotion for Hate Speech Detection [5.649040805759824]
This paper proposes a novel multitask learning-based model, AngryBERT, which jointly learns hate speech detection with sentiment classification and target identification as secondary relevant tasks.
Experiment results show that AngryBERT outperforms state-of-the-art single-task-learning and multitask learning baselines.
arXiv Detail & Related papers (2021-03-14T16:17:26Z) - HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection [27.05719607624675]
We introduce HateXplain, the first benchmark hate speech dataset covering multiple aspects of the issue.
Each post in our dataset is annotated from three different perspectives.
We observe that models, which utilize the human rationales for training, perform better in reducing unintended bias towards target communities.
arXiv Detail & Related papers (2020-12-18T15:12:14Z) - Improving Robustness by Augmenting Training Sentences with
Predicate-Argument Structures [62.562760228942054]
Existing approaches to improve robustness against dataset biases mostly focus on changing the training objective.
We propose to augment the input sentences in the training data with their corresponding predicate-argument structures.
We show that without targeting a specific bias, our sentence augmentation improves the robustness of transformer models against multiple biases.
arXiv Detail & Related papers (2020-10-23T16:22:05Z) - Stereotypical Bias Removal for Hate Speech Detection Task using
Knowledge-based Generalizations [16.304516254043865]
We study bias mitigation from unstructured text data for hate speech detection.
We propose novel methods leveraging knowledge-based generalizations for bias-free learning.
Our experiments with two real-world datasets, a Wikipedia Talk Pages dataset and a Twitter dataset, show that the use of knowledge-based generalizations results in better performance.
arXiv Detail & Related papers (2020-01-15T18:17:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.