Related papers: Evaluation of Hate Speech Detection Using Large Language Models and Geographical Contextualization

Evaluation of Hate Speech Detection Using Large Language Models and Geographical Contextualization

URL: http://arxiv.org/abs/2502.19612v1
Date: Wed, 26 Feb 2025 22:59:36 GMT
Title: Evaluation of Hate Speech Detection Using Large Language Models and Geographical Contextualization
Authors: Anwar Hossain Zahid, Monoshi Kumar Roy, Swarna Das,
Abstract summary: This study systematically investigates the performance of LLMs on detecting hate speech across multilingual and diverse geographic contexts.<n>We evaluate three state-of-the-art LLMs: Llama2 (13b), Codellama (7b), and DeepSeekCoder (6.7b)<n>Codellama had the best binary classification recall with 70.6% and an F1-score of 52.18%, whereas DeepSeekCoder had the best performance in geographic sensitivity, correctly detecting 63 out of 265 locations.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The proliferation of hate speech on social media is one of the serious issues that is bringing huge impacts to society: an escalation of violence, discrimination, and social fragmentation. The problem of detecting hate speech is intrinsically multifaceted due to cultural, linguistic, and contextual complexities and adversarial manipulations. In this study, we systematically investigate the performance of LLMs on detecting hate speech across multilingual datasets and diverse geographic contexts. Our work presents a new evaluation framework in three dimensions: binary classification of hate speech, geography-aware contextual detection, and robustness to adversarially generated text. Using a dataset of 1,000 comments from five diverse regions, we evaluate three state-of-the-art LLMs: Llama2 (13b), Codellama (7b), and DeepSeekCoder (6.7b). Codellama had the best binary classification recall with 70.6% and an F1-score of 52.18%, whereas DeepSeekCoder had the best performance in geographic sensitivity, correctly detecting 63 out of 265 locations. The tests for adversarial robustness also showed significant weaknesses; Llama2 misclassified 62.5% of manipulated samples. These results bring to light the trade-offs between accuracy, contextual understanding, and robustness in the current versions of LLMs. This work has thus set the stage for developing contextually aware, multilingual hate speech detection systems by underlining key strengths and limitations, therefore offering actionable insights for future research and real-world applications.

Related papers

Fine-Grained Chinese Hate Speech Understanding: Span-Level Resources, Coded Term Lexicon, and Enhanced Detection Frameworks [13.187315629074428]
We introduce the Span-level Target-Aware Toxicity Extraction dataset (STATE ToxiCN), the first span-level Chinese hate speech dataset.<n>We conduct the first comprehensive study on Chinese coded hate terms, LLMs' ability to interpret hate semantics.<n>We propose a method to integrate an annotated lexicon into models, significantly enhancing hate speech detection performance.
arXiv Detail & Related papers (2025-07-15T13:19:18Z)
Can Prompting LLMs Unlock Hate Speech Detection across Languages? A Zero-shot and Few-shot Study [59.30098850050971]
This work evaluates LLM prompting-based detection across eight non-English languages.<n>We show that while zero-shot and few-shot prompting lag behind fine-tuned encoder models on most of the real-world evaluation sets, they achieve better generalization on functional tests for hate speech detection.
arXiv Detail & Related papers (2025-05-09T16:00:01Z)
Dual-Class Prompt Generation: Enhancing Indonesian Gender-Based Hate Speech Detection through Data Augmentation [0.0]
Detecting gender-based hate speech in Indonesian social media remains challenging due to limited labeled datasets. We evaluate backtranslation, single-class prompt generation, and our proposed dual-class prompt generation. Our findings suggest that incorporating examples from both classes helps language models generate more diverse yet representative samples.
arXiv Detail & Related papers (2025-03-06T10:07:51Z)
Hate Personified: Investigating the role of LLMs in content moderation [64.26243779985393]
For subjective tasks such as hate detection, where people perceive hate differently, the Large Language Model's (LLM) ability to represent diverse groups is unclear. By including additional context in prompts, we analyze LLM's sensitivity to geographical priming, persona attributes, and numerical information to assess how well the needs of various groups are reflected.
arXiv Detail & Related papers (2024-10-03T16:43:17Z)
Harnessing Artificial Intelligence to Combat Online Hate: Exploring the Challenges and Opportunities of Large Language Models in Hate Speech Detection [4.653571633477755]
Large language models (LLMs) excel in many diverse applications beyond language generation, e.g., translation, summarization, and sentiment analysis. This becomes pertinent in the realm of identifying hateful or toxic speech -- a domain fraught with challenges and ethical dilemmas.
arXiv Detail & Related papers (2024-03-12T19:12:28Z)
An Investigation of Large Language Models for Real-World Hate Speech Detection [46.15140831710683]
A major limitation of existing methods is that hate speech detection is a highly contextual problem. Recently, large language models (LLMs) have demonstrated state-of-the-art performance in several natural language tasks. Our study reveals that a meticulously crafted reasoning prompt can effectively capture the context of hate speech.
arXiv Detail & Related papers (2024-01-07T00:39:33Z)
Model-Agnostic Meta-Learning for Multilingual Hate Speech Detection [23.97444551607624]
Hate speech in social media is a growing phenomenon, and detecting such toxic content has gained significant traction. HateMAML is a model-agnostic meta-learning-based framework that effectively performs hate speech detection in low-resource languages. Extensive experiments are conducted on five datasets across eight different low-resource languages.
arXiv Detail & Related papers (2023-03-04T22:28:29Z)
A New Generation of Perspective API: Efficient Multilingual Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw. At the heart of the approach is a single multilingual token-free Charformer model. We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z)
Deep Learning for Hate Speech Detection: A Comparative Study [54.42226495344908]
We present here a large-scale empirical comparison of deep and shallow hate-speech detection methods. Our goal is to illuminate progress in the area, and identify strengths and weaknesses in the current state-of-the-art. In doing so we aim to provide guidance as to the use of hate-speech detection in practice, quantify the state-of-the-art, and identify future research directions.
arXiv Detail & Related papers (2022-02-19T03:48:20Z)
Addressing the Challenges of Cross-Lingual Hate Speech Detection [115.1352779982269]
In this paper we focus on cross-lingual transfer learning to support hate speech detection in low-resource languages. We leverage cross-lingual word embeddings to train our neural network systems on the source language and apply it to the target language. We investigate the issue of label imbalance of hate speech datasets, since the high ratio of non-hate examples compared to hate examples often leads to low model performance.
arXiv Detail & Related papers (2022-01-15T20:48:14Z)
AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context. It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts. Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z)
Leveraging Multilingual Transformers for Hate Speech Detection [11.306581296760864]
We leverage state of the art Transformer language models to identify hate speech in a multilingual setting. With a pre-trained multilingual Transformer-based text encoder at the base, we are able to successfully identify and classify hate speech from multiple languages.
arXiv Detail & Related papers (2021-01-08T20:23:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.