To BAN or not to BAN: Bayesian Attention Networks for Reliable Hate
Speech Detection
- URL: http://arxiv.org/abs/2007.05304v7
- Date: Thu, 17 Dec 2020 09:43:00 GMT
- Title: To BAN or not to BAN: Bayesian Attention Networks for Reliable Hate
Speech Detection
- Authors: Kristian Miok, Blaz Skrlj, Daniela Zaharie and Marko Robnik-Sikonja
- Abstract summary: Hate speech is an important problem in the management of user-generated content. To remove offensive content or ban misbehaving users, content moderators need reliable hate speech detectors.
Deep neural networks based on the transformer architecture, such as the (multilingual) BERT model, achieve superior performance in many natural language classification tasks, including hate speech detection.
We propose a Bayesian method using Monte Carlo dropout within the attention layers of the transformer models to provide well-calibrated reliability estimates.
- Score: 3.7768834126209234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hate speech is an important problem in the management of user-generated
content. To remove offensive content or ban misbehaving users, content
moderators need reliable hate speech detectors. Recently, deep neural networks
based on the transformer architecture, such as the (multilingual) BERT model,
achieve superior performance in many natural language classification tasks,
including hate speech detection. So far, these methods have not been able to
quantify their output in terms of reliability. We propose a Bayesian method
using Monte Carlo dropout within the attention layers of the transformer models
to provide well-calibrated reliability estimates. We evaluate and visualize the
results of the proposed approach on hate speech detection problems in several
languages. Additionally, we test if affective dimensions can enhance the
information extracted by the BERT model in hate speech classification. Our
experiments show that Monte Carlo dropout provides a viable mechanism for
reliability estimation in transformer networks. Used within the BERT model, it
ofers state-of-the-art classification performance and can detect less trusted
predictions. Also, it was observed that affective dimensions extracted using
sentic computing methods can provide insights toward interpretation of emotions
involved in hate speech. Our approach not only improves the classification
performance of the state-of-the-art multilingual BERT model but the computed
reliability scores also significantly reduce the workload in an inspection of
ofending cases and reannotation campaigns. The provided visualization helps to
understand the borderline outcomes.
Related papers
- Offensive Language Detection on Social Media Using XLNet [0.0]
We propose an automatic offensive language detection model based on XLNet, a generalized autoregressive pretraining method, and compare its performance with BERT (Bigressive Representations from Transformers)<n>Our experimental results show that XLNet outperforms BERT in detecting offensive content and in categorizing the types of offenses, while BERT performs slightly better in identifying the targets of the offenses.<n>These findings highlight the potential of transfer learning and XLNet-based architectures to create robust systems for detecting offensive language on social media platforms.
arXiv Detail & Related papers (2025-06-26T22:37:35Z) - OMNIGUARD: An Efficient Approach for AI Safety Moderation Across Modalities [54.152681077418805]
Current detection approaches are fallible, and are particularly susceptible to attacks that exploit mismatched generalizations of model capabilities.<n>We propose OMNIGUARD, an approach for detecting harmful prompts across languages and modalities.<n>Our approach improves harmful prompt classification accuracy by 11.57% over the strongest baseline in a multilingual setting.
arXiv Detail & Related papers (2025-05-29T05:25:27Z) - $C^2$AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction [80.57232374640911]
We propose a model-agnostic strategy called the Mask-And-Recover (MAR)
MAR integrates both inter- and intra-modality contextual correlations to enable global inference within extraction modules.
To better target challenging parts within each sample, we introduce a Fine-grained Confidence Score (FCS) model.
arXiv Detail & Related papers (2025-04-01T13:01:30Z) - Code-Mixed Telugu-English Hate Speech Detection [0.0]
This study investigates transformer-based models, including TeluguHateBERT, HateBERT, DeBERTa, Muril, IndicBERT, Roberta, and Hindi-Abusive-MuRIL, for classifying hate speech in Telugu.
We fine-tune these models using Low-Rank Adaptation (LoRA) to optimize efficiency and performance.
We translate Telugu text into English using Google Translate to assess its impact on classification accuracy.
arXiv Detail & Related papers (2025-02-15T02:03:13Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Hate Speech and Offensive Language Detection using an Emotion-aware
Shared Encoder [1.8734449181723825]
Existing works on hate speech and offensive language detection produce promising results based on pre-trained transformer models.
This paper addresses a multi-task joint learning approach which combines external emotional features extracted from another corpora.
Our findings demonstrate that emotional knowledge helps to more reliably identify hate speech and offensive language across datasets.
arXiv Detail & Related papers (2023-02-17T09:31:06Z) - Shapley Head Pruning: Identifying and Removing Interference in
Multilingual Transformers [54.4919139401528]
We show that it is possible to reduce interference by identifying and pruning language-specific parameters.
We show that removing identified attention heads from a fixed model improves performance for a target language on both sentence classification and structural prediction.
arXiv Detail & Related papers (2022-10-11T18:11:37Z) - Combating high variance in Data-Scarce Implicit Hate Speech
Classification [0.0]
We develop a novel RoBERTa-based model that achieves state-of-the-art performance.
In this paper, we explore various optimization and regularization techniques and develop a novel RoBERTa-based model that achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-08-29T13:45:21Z) - HateCheckHIn: Evaluating Hindi Hate Speech Detection Models [6.52974752091861]
multilingual hate is a major emerging challenge for automated detection.
We introduce a set of functionalities for the purpose of evaluation.
Considering Hindi as a base language, we craft test cases for each functionality.
arXiv Detail & Related papers (2022-04-30T19:09:09Z) - APEACH: Attacking Pejorative Expressions with Analysis on
Crowd-Generated Hate Speech Evaluation Datasets [4.034948808542701]
APEACH is a method that allows the collection of hate speech generated by unspecified users.
By controlling the crowd-generation of hate speech and adding only a minimum post-labeling, we create a corpus that enables the generalizable and fair evaluation of hate speech detection.
arXiv Detail & Related papers (2022-02-25T02:04:38Z) - Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods.
Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art.
In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z) - Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages.
We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language.
We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z) - Speaker Embedding-aware Neural Diarization for Flexible Number of
Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels.
Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z) - Constructing interval variables via faceted Rasch measurement and
multitask deep learning: a hate speech application [63.10266319378212]
We propose a method for measuring complex variables on a continuous, interval spectrum by combining supervised deep learning with the Constructing Measures approach to faceted Rasch item response theory (IRT)
We demonstrate this new method on a dataset of 50,000 social media comments sourced from YouTube, Twitter, and Reddit and labeled by 11,000 U.S.-based Amazon Mechanical Turk workers.
arXiv Detail & Related papers (2020-09-22T02:15:05Z) - An Effective Contextual Language Modeling Framework for Speech
Summarization with Augmented Features [13.97006782398121]
Bidirectional Representations from Transformers (BERT) model was proposed and has achieved record-breaking success on many natural language processing tasks.
We explore the incorporation of confidence scores into sentence representations to see if such an attempt could help alleviate the negative effects caused by imperfect automatic speech recognition.
We validate the effectiveness of our proposed method on a benchmark dataset.
arXiv Detail & Related papers (2020-06-01T18:27:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.