Related papers: Bridging the Multilingual Safety Divide: Efficient, Culturally-Aware Alignment for Global South Languages

Bridging the Multilingual Safety Divide: Efficient, Culturally-Aware Alignment for Global South Languages

URL: http://arxiv.org/abs/2602.13867v1
Date: Sat, 14 Feb 2026 19:56:40 GMT
Title: Bridging the Multilingual Safety Divide: Efficient, Culturally-Aware Alignment for Global South Languages
Authors: Somnath Banerjee, Rima Hazra, Animesh Mukherjee,
Abstract summary: Large language models (LLMs) are being deployed across the Global South.<n> Everyday use involves low-resource languages, code-mixing, and culturally specific norms.<n>Our aim is to make multilingual safety a core requirement-not an add-on-for equitable AI in underrepresented regions.
Score: 8.667909336164465
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) are being deployed across the Global South, where everyday use involves low-resource languages, code-mixing, and culturally specific norms. Yet safety pipelines, benchmarks, and alignment still largely target English and a handful of high-resource languages, implicitly assuming safety and factuality ''transfer'' across languages. Evidence increasingly shows they do not. We synthesize recent findings indicating that (i) safety guardrails weaken sharply on low-resource and code-mixed inputs, (ii) culturally harmful behavior can persist even when standard toxicity scores look acceptable, and (iii) English-only knowledge edits and safety patches often fail to carry over to low-resource languages. In response, we outline a practical agenda for researchers and students in the Global South: parameter-efficient safety steering, culturally grounded evaluation and preference data, and participatory workflows that empower local communities to define and mitigate harm. Our aim is to make multilingual safety a core requirement-not an add-on-for equitable AI in underrepresented regions.

Related papers

Layer-wise Swapping for Generalizable Multilingual Safety [8.658596218544773]
Existing safety datasets are predominantly English centric, limiting progress in multilingual safety alignment.<n>We propose a safety aware layer swapping method that transfers safety alignment from an English safety expert to low resource language experts without additional training.
arXiv Detail & Related papers (2026-01-30T06:22:02Z)
SEA-SafeguardBench: Evaluating AI Safety in SEA Languages and Cultures [36.95168918567729]
Existing multilingual safety benchmarks often rely on machine-translated English data.<n>We introduce SEA-SafeguardBench, the first human-verified safety benchmark for SEA.<n>It covers eight languages, 21,640 samples, across three subsets: general, in-the-wild, and content generation.
arXiv Detail & Related papers (2025-12-05T07:57:57Z)
Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore's Low-Resource Languages [57.059267233093465]
Large Language Models (LLMs) have transformed natural language processing, but their safety mechanisms remain under-explored in low-resource, multilingual settings.<n>We introduce textsfSGToxicGuard, a novel dataset and evaluation framework for benchmarking LLM safety in Singapore's diverse linguistic context.<n>We conduct extensive experiments with state-of-the-art multilingual LLMs, and the results uncover critical gaps in their safety guardrails.
arXiv Detail & Related papers (2025-09-18T08:14:34Z)
LinguaSafe: A Comprehensive Multilingual Safety Benchmark for Large Language Models [22.273388934888278]
Our dataset comprises 45k entries in 12 languages, ranging from Hungarian to Malay.<n>Our benchmark provides a comprehensive suite of metrics for in-depth safety evaluation.
arXiv Detail & Related papers (2025-08-18T08:59:01Z)
Beyond Weaponization: NLP Security for Medium and Lower-Resourced Languages in Their Own Right [0.0]
This work examines the security of LMs for lower- and medium-resourced languages.<n>We extend existing adversarial attacks for up to 70 languages to evaluate the security of monolingual and multilingual LMs for these languages.
arXiv Detail & Related papers (2025-07-04T10:54:04Z)
IndoSafety: Culturally Grounded Safety for LLMs in Indonesian Languages [6.4212082894269535]
We present IndoSafety, the first high-quality, human-verified safety evaluation dataset tailored for the Indonesian context.<n>IndoSafety is constructed by extending prior safety frameworks to develop a taxonomy that captures Indonesia's sociocultural context.
arXiv Detail & Related papers (2025-06-03T07:53:55Z)
MPO: Multilingual Safety Alignment via Reward Gap Optimization [88.76638442683391]
Large language models (LLMs) have become increasingly central to AI applications worldwide.<n>Existing preference learning methods for safety alignment, such as RLHF and DPO, are primarily monolingual and struggle with noisy multilingual data.<n>We introduce Multilingual reward gaP Optimization (MPO), a novel approach that leverages the well-aligned safety capabilities of the dominant language (English) to improve safety alignment across multiple languages.
arXiv Detail & Related papers (2025-05-22T16:24:51Z)
MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety [56.77103365251923]
Large Language Models (LLMs) are susceptible to adversarial attacks such as jailbreaking.<n>This vulnerability is exacerbated in multilingual settings, where multilingual safety-aligned data is often limited.<n>We introduce a multilingual guardrail with reasoning for prompt classification.
arXiv Detail & Related papers (2025-04-21T17:15:06Z)
LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Inconsistencies [63.10843814055688]
M-ALERT is a benchmark that evaluates the safety of Large Language Models in five languages.<n>M-ALERT includes 15k high-quality prompts per language, totaling 75k, with category-wise annotations.<n>Our experiments on 39 state-of-the-art LLMs highlight the importance of language-specific safety analysis.
arXiv Detail & Related papers (2024-12-19T16:46:54Z)
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion [117.178835165855]
This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs. Our studies reveal a new and universal safety vulnerability of these models against code input. We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization.
arXiv Detail & Related papers (2024-03-12T17:55:38Z)
All Languages Matter: On the Multilingual Safety of Large Language Models [96.47607891042523]
We build the first multilingual safety benchmark for large language models (LLMs) XSafety covers 14 kinds of commonly used safety issues across 10 languages that span several language families. We propose several simple and effective prompting methods to improve the multilingual safety of ChatGPT.
arXiv Detail & Related papers (2023-10-02T05:23:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.