Related papers: LLM-Based Multi-Task Bangla Hate Speech Detection: Type, Severity, and Target

LLM-Based Multi-Task Bangla Hate Speech Detection: Type, Severity, and Target

URL: http://arxiv.org/abs/2510.01995v1
Date: Thu, 02 Oct 2025 13:17:11 GMT
Title: LLM-Based Multi-Task Bangla Hate Speech Detection: Type, Severity, and Target
Authors: Md Arid Hasan, Firoj Alam, Md Fahad Hossain, Usman Naseem, Syed Ishtiaque Ahmed,
Abstract summary: We introduce the first multi-task Bangla hate-speech dataset, BanglaMultiHate, one of the largest manually annotated corpus to date.<n>We compare classical baselines, monolingual pretrained models, and LLMs under zero-shot prompting and LoRA fine-tuning.<n>Our experiments assess LLM adaptability in a low-resource setting and reveal a consistent trend. Although LoRA-tuned LLMs are competitive with BanglaBERT, culturally and linguistically grounded pretraining remains critical for robust performance.
Score: 27.786707138241493
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Online social media platforms are central to everyday communication and information seeking. While these platforms serve positive purposes, they also provide fertile ground for the spread of hate speech, offensive language, and bullying content targeting individuals, organizations, and communities. Such content undermines safety, participation, and equity online. Reliable detection systems are therefore needed, especially for low-resource languages where moderation tools are limited. In Bangla, prior work has contributed resources and models, but most are single-task (e.g., binary hate/offense) with limited coverage of multi-facet signals (type, severity, target). We address these gaps by introducing the first multi-task Bangla hate-speech dataset, BanglaMultiHate, one of the largest manually annotated corpus to date. Building on this resource, we conduct a comprehensive, controlled comparison spanning classical baselines, monolingual pretrained models, and LLMs under zero-shot prompting and LoRA fine-tuning. Our experiments assess LLM adaptability in a low-resource setting and reveal a consistent trend: although LoRA-tuned LLMs are competitive with BanglaBERT, culturally and linguistically grounded pretraining remains critical for robust performance. Together, our dataset and findings establish a stronger benchmark for developing culturally aligned moderation tools in low-resource contexts. For reproducibility, we will release the dataset and all related scripts.

Related papers

BIDWESH: A Bangla Regional Based Hate Speech Detection Dataset [0.0]
This study introduces BIDWESH, the first multi-dialectal Bangla hate speech dataset.<n>It was constructed by translating and annotating 9,183 instances from the BD-SHS corpus into three major regional dialects.<n>The resulting dataset provides a linguistically rich, balanced, and inclusive resource for advancing hate speech detection in Bangla.
arXiv Detail & Related papers (2025-07-22T02:53:48Z)
Rethinking Hate Speech Detection on Social Media: Can LLMs Replace Traditional Models? [3.611706857555358]
Hate speech detection across contemporary social media presents unique challenges due to linguistic diversity and the informal nature of online discourse.<n>These challenges are further amplified in settings involving code-mixing, transliteration, and culturally nuanced expressions.<n>We argue that recent large language models (LLMs) not only surpass them but also redefine the landscape of hate speech detection more broadly.
arXiv Detail & Related papers (2025-06-15T06:48:47Z)
Latent Factor Models Meets Instructions: Goal-conditioned Latent Factor Discovery without Task Supervision [50.45597801390757]
Instruct-LF is a goal-oriented latent factor discovery system.<n>It integrates instruction-following ability with statistical models to handle noisy datasets.
arXiv Detail & Related papers (2025-02-21T02:03:08Z)
LIBRA: Measuring Bias of Large Language Model from a Local Context [9.612845616659776]
Large Language Models (LLMs) have significantly advanced natural language processing applications.<n>Yet their widespread use raises concerns regarding inherent biases that may reduce utility or harm for particular social groups.<n>This research addresses these limitations with a Local Integrated Bias Recognition and Assessment Framework (LIBRA) for measuring bias.
arXiv Detail & Related papers (2025-02-02T04:24:57Z)
A Federated Approach to Few-Shot Hate Speech Detection for Marginalized Communities [43.37824420609252]
Hate speech online remains an understudied issue for marginalized communities.<n>In this paper, we aim to provide marginalized communities with a privacy-preserving tool to protect themselves from online hate speech.
arXiv Detail & Related papers (2024-12-06T11:00:05Z)
NewsInterview: a Dataset and a Playground to Evaluate LLMs' Ground Gap via Informational Interviews [65.35458530702442]
We focus on journalistic interviews, a domain rich in grounding communication and abundant in data. We curate a dataset of 40,000 two-person informational interviews from NPR and CNN. LLMs are significantly less likely than human interviewers to use acknowledgements and to pivot to higher-level questions.
arXiv Detail & Related papers (2024-11-21T01:37:38Z)
Think Carefully and Check Again! Meta-Generation Unlocking LLMs for Low-Resource Cross-Lingual Summarization [108.6908427615402]
Cross-lingual summarization ( CLS) aims to generate a summary for the source text in a different target language.<n>Currently, instruction-tuned large language models (LLMs) excel at various English tasks.<n>Recent studies have shown that LLMs' performance on CLS tasks remains unsatisfactory even with few-shot settings.
arXiv Detail & Related papers (2024-10-26T00:39:44Z)
Assessing Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks [68.33068005789116]
We introduce ReDial, a benchmark containing 1.2K+ parallel query pairs in Standardized English and AAVE.<n>We evaluate widely used models, including GPT, Claude, Llama, Mistral, and the Phi model families.<n>Our work establishes a systematic and objective framework for analyzing LLM bias in dialectal queries.
arXiv Detail & Related papers (2024-10-14T18:44:23Z)
BanStereoSet: A Dataset to Measure Stereotypical Social Biases in LLMs for Bangla [0.0]
This study presents BanStereoSet, a dataset designed to evaluate stereotypical social biases in multilingual LLMs for the Bangla language.<n>Our dataset consists of 1,194 sentences spanning 9 categories of bias: race, profession, gender, ageism, beauty in profession, region, caste, and religion.
arXiv Detail & Related papers (2024-09-18T02:02:30Z)
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models [56.25156596019168]
This paper introduces the LMRL-Gym benchmark for evaluating multi-turn RL for large language models (LLMs) Our benchmark consists of 8 different language tasks, which require multiple rounds of language interaction and cover a range of tasks in open-ended dialogue and text games.
arXiv Detail & Related papers (2023-11-30T03:59:31Z)
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages [86.90220551111096]
Training datasets for large language models (LLMs) are often not fully disclosed. We present CulturaX, a substantial multilingual dataset with 6.3 trillion tokens in 167 languages.
arXiv Detail & Related papers (2023-09-17T23:49:10Z)
Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes. With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech. We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.