Related papers: SEA-SafeguardBench: Evaluating AI Safety in SEA Languages and Cultures

SEA-SafeguardBench: Evaluating AI Safety in SEA Languages and Cultures

URL: http://arxiv.org/abs/2512.05501v1
Date: Fri, 05 Dec 2025 07:57:57 GMT
Title: SEA-SafeguardBench: Evaluating AI Safety in SEA Languages and Cultures
Authors: Panuthep Tasawong, Jian Gang Ngui, Alham Fikri Aji, Trevor Cohn, Peerat Limkonchotiwat,
Abstract summary: Existing multilingual safety benchmarks often rely on machine-translated English data.<n>We introduce SEA-SafeguardBench, the first human-verified safety benchmark for SEA.<n>It covers eight languages, 21,640 samples, across three subsets: general, in-the-wild, and content generation.
Score: 36.95168918567729
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Safeguard models help large language models (LLMs) detect and block harmful content, but most evaluations remain English-centric and overlook linguistic and cultural diversity. Existing multilingual safety benchmarks often rely on machine-translated English data, which fails to capture nuances in low-resource languages. Southeast Asian (SEA) languages are underrepresented despite the region's linguistic diversity and unique safety concerns, from culturally sensitive political speech to region-specific misinformation. Addressing these gaps requires benchmarks that are natively authored to reflect local norms and harm scenarios. We introduce SEA-SafeguardBench, the first human-verified safety benchmark for SEA, covering eight languages, 21,640 samples, across three subsets: general, in-the-wild, and content generation. The experimental results from our benchmark demonstrate that even state-of-the-art LLMs and guardrails are challenged by SEA cultural and harm scenarios and underperform when compared to English texts.

Related papers

Bridging the Multilingual Safety Divide: Efficient, Culturally-Aware Alignment for Global South Languages [8.667909336164465]
Large language models (LLMs) are being deployed across the Global South.<n> Everyday use involves low-resource languages, code-mixing, and culturally specific norms.<n>Our aim is to make multilingual safety a core requirement-not an add-on-for equitable AI in underrepresented regions.
arXiv Detail & Related papers (2026-02-14T19:56:40Z)
SEA-Guard: Culturally Grounded Multilingual Safeguard for Southeast Asia [36.95168918567729]
Building large-scale, culturally grounded datasets is challenging due to limited resources and a scarcity of native annotators.<n>We present a novel agentic data-generation framework to scalably create authentic, region-specific safety datasets for Southeast Asia.<n>We introduce the SEA-Guard family, the first multilingual safeguard models grounded in SEA cultural contexts.
arXiv Detail & Related papers (2026-02-02T04:20:35Z)
UbuntuGuard: A Culturally-Grounded Policy Benchmark for Equitable AI Safety in African Languages [18.40701733030824]
Current guardian models are predominantly Western-centric and optimized for high-resource languages.<n>We introduce UbuntuGuard, the first African policy-based safety benchmark built from adversarial queries authored by 155 domain experts.
arXiv Detail & Related papers (2026-01-19T03:37:56Z)
Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore's Low-Resource Languages [57.059267233093465]
Large Language Models (LLMs) have transformed natural language processing, but their safety mechanisms remain under-explored in low-resource, multilingual settings.<n>We introduce textsfSGToxicGuard, a novel dataset and evaluation framework for benchmarking LLM safety in Singapore's diverse linguistic context.<n>We conduct extensive experiments with state-of-the-art multilingual LLMs, and the results uncover critical gaps in their safety guardrails.
arXiv Detail & Related papers (2025-09-18T08:14:34Z)
LinguaSafe: A Comprehensive Multilingual Safety Benchmark for Large Language Models [22.273388934888278]
Our dataset comprises 45k entries in 12 languages, ranging from Hungarian to Malay.<n>Our benchmark provides a comprehensive suite of metrics for in-depth safety evaluation.
arXiv Detail & Related papers (2025-08-18T08:59:01Z)
RabakBench: Scaling Human Annotations to Construct Localized Multilingual Safety Benchmarks for Low-Resource Languages [3.7678366606419345]
RabakBench is a new multilingual safety benchmark localized to Singapore's unique linguistic context.<n>The benchmark dataset, including the human-verified translations, and evaluation code are publicly available.
arXiv Detail & Related papers (2025-07-08T13:37:25Z)
MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety [56.77103365251923]
Large Language Models (LLMs) are susceptible to adversarial attacks such as jailbreaking.<n>This vulnerability is exacerbated in multilingual settings, where multilingual safety-aligned data is often limited.<n>We introduce a multilingual guardrail with reasoning for prompt classification.
arXiv Detail & Related papers (2025-04-21T17:15:06Z)
Qorgau: Evaluating LLM Safety in Kazakh-Russian Bilingual Contexts [40.0358736497799]
Large language models (LLMs) are known to have the potential to generate harmful content.<n>This paper introduces Qorgau, a novel dataset specifically designed for safety evaluation in Kazakh and Russian.
arXiv Detail & Related papers (2025-02-19T11:33:22Z)
LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Inconsistencies [63.10843814055688]
M-ALERT is a benchmark that evaluates the safety of Large Language Models in five languages.<n>M-ALERT includes 15k high-quality prompts per language, totaling 75k, with category-wise annotations.<n>Our experiments on 39 state-of-the-art LLMs highlight the importance of language-specific safety analysis.
arXiv Detail & Related papers (2024-12-19T16:46:54Z)
Arabic Dataset for LLM Safeguard Evaluation [62.96160492994489]
This study explores the safety of large language models (LLMs) in Arabic with its linguistic and cultural complexities.<n>We present an Arab-region-specific safety evaluation dataset consisting of 5,799 questions, including direct attacks, indirect attacks, and harmless requests with sensitive words.
arXiv Detail & Related papers (2024-10-22T14:12:43Z)
All Languages Matter: On the Multilingual Safety of Large Language Models [96.47607891042523]
We build the first multilingual safety benchmark for large language models (LLMs) XSafety covers 14 kinds of commonly used safety issues across 10 languages that span several language families. We propose several simple and effective prompting methods to improve the multilingual safety of ChatGPT.
arXiv Detail & Related papers (2023-10-02T05:23:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.