Related papers: SEA-Guard: Culturally Grounded Multilingual Safeguard for Southeast Asia

SEA-Guard: Culturally Grounded Multilingual Safeguard for Southeast Asia

URL: http://arxiv.org/abs/2602.01618v1
Date: Mon, 02 Feb 2026 04:20:35 GMT
Title: SEA-Guard: Culturally Grounded Multilingual Safeguard for Southeast Asia
Authors: Panuthep Tasawong, Jian Gang Ngui, Alham Fikri Aji, Trevor Cohn, Peerat Limkonchotiwat,
Abstract summary: Building large-scale, culturally grounded datasets is challenging due to limited resources and a scarcity of native annotators.<n>We present a novel agentic data-generation framework to scalably create authentic, region-specific safety datasets for Southeast Asia.<n>We introduce the SEA-Guard family, the first multilingual safeguard models grounded in SEA cultural contexts.
Score: 36.95168918567729
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Culturally aware safeguards are crucial for AI alignment in real-world settings, where safety extends beyond common sense and encompasses diverse local values, norms, and region-specific regulations. However, building large-scale, culturally grounded datasets is challenging due to limited resources and a scarcity of native annotators. Consequently, many safeguard models rely on machine translation of English datasets, often missing regional and cultural nuances. We present a novel agentic data-generation framework to scalably create authentic, region-specific safety datasets for Southeast Asia (SEA). On this foundation, we introduce the SEA-Guard family, the first multilingual safeguard models grounded in SEA cultural contexts. Evaluated across multiple benchmarks and cultural variants, SEA-Guard consistently outperforms existing safeguards at detecting regionally sensitive or harmful content while maintaining strong general safety performance.

Related papers

Bridging the Multilingual Safety Divide: Efficient, Culturally-Aware Alignment for Global South Languages [8.667909336164465]
Large language models (LLMs) are being deployed across the Global South.<n> Everyday use involves low-resource languages, code-mixing, and culturally specific norms.<n>Our aim is to make multilingual safety a core requirement-not an add-on-for equitable AI in underrepresented regions.
arXiv Detail & Related papers (2026-02-14T19:56:40Z)
UbuntuGuard: A Culturally-Grounded Policy Benchmark for Equitable AI Safety in African Languages [18.40701733030824]
Current guardian models are predominantly Western-centric and optimized for high-resource languages.<n>We introduce UbuntuGuard, the first African policy-based safety benchmark built from adversarial queries authored by 155 domain experts.
arXiv Detail & Related papers (2026-01-19T03:37:56Z)
SEA-SafeguardBench: Evaluating AI Safety in SEA Languages and Cultures [36.95168918567729]
Existing multilingual safety benchmarks often rely on machine-translated English data.<n>We introduce SEA-SafeguardBench, the first human-verified safety benchmark for SEA.<n>It covers eight languages, 21,640 samples, across three subsets: general, in-the-wild, and content generation.
arXiv Detail & Related papers (2025-12-05T07:57:57Z)
AssurAI: Experience with Constructing Korean Socio-cultural Datasets to Discover Potential Risks of Generative AI [50.802995291689086]
We introduce AssurAI, a new quality-controlled Korean multimodal dataset for evaluating the safety of generative AI.<n>We define a taxonomy of 35 distinct AI risk factors, adapted from established frameworks to cover both universal harms and relevance to the Korean socio-cultural context.<n>AssurAI is a large-scale Korean multimodal dataset comprising 11,480 instances across text, image, video, and audio.
arXiv Detail & Related papers (2025-11-20T13:59:42Z)
CultureGuard: Towards Culturally-Aware Dataset and Guard Model for Multilingual Safety Applications [5.151690536714851]
We present CultureGuard, a novel solution for curating culturally aligned, high-quality safety datasets across multiple languages.<n>Our approach introduces a four-stage synthetic data generation and filtering pipeline: cultural data segregation, cultural data adaptation, machine translation, and quality filtering.<n>The resulting dataset, Nemotron-Safety-Guard-Dataset-v3, comprises 386,661 samples in 9 languages and facilitates the training of Llama-3.1-Nemotron-Safety-Guard-8B-v3 via LoRA-based fine-tuning.
arXiv Detail & Related papers (2025-08-03T10:35:05Z)
IndoSafety: Culturally Grounded Safety for LLMs in Indonesian Languages [6.4212082894269535]
We present IndoSafety, the first high-quality, human-verified safety evaluation dataset tailored for the Indonesian context.<n>IndoSafety is constructed by extending prior safety frameworks to develop a taxonomy that captures Indonesia's sociocultural context.
arXiv Detail & Related papers (2025-06-03T07:53:55Z)
Multimodal Cultural Safety: Evaluation Frameworks and Alignment Strategies [58.88053690412802]
Large vision-language models (LVLMs) are increasingly deployed in globally distributed applications, such as tourism assistants.<n> CROSS is a benchmark designed to assess the cultural safety reasoning capabilities of LVLMs.<n>We evaluate 21 leading LVLMs, including mixture-of-experts models and reasoning models.
arXiv Detail & Related papers (2025-05-20T23:20:38Z)
MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety [56.77103365251923]
Large Language Models (LLMs) are susceptible to adversarial attacks such as jailbreaking.<n>This vulnerability is exacerbated in multilingual settings, where multilingual safety-aligned data is often limited.<n>We introduce a multilingual guardrail with reasoning for prompt classification.
arXiv Detail & Related papers (2025-04-21T17:15:06Z)
SafeWorld: Geo-Diverse Safety Alignment [107.84182558480859]
We introduce SafeWorld, a novel benchmark specifically designed to evaluate Large Language Models (LLMs)<n>SafeWorld encompasses 2,342 test user queries, each grounded in high-quality, human-verified cultural norms and legal policies from 50 countries and 493 regions/races.<n>Our trained SafeWorldLM outperforms all competing models, including GPT-4o on all three evaluation dimensions by a large margin.
arXiv Detail & Related papers (2024-12-09T13:31:46Z)
Arabic Dataset for LLM Safeguard Evaluation [62.96160492994489]
This study explores the safety of large language models (LLMs) in Arabic with its linguistic and cultural complexities.<n>We present an Arab-region-specific safety evaluation dataset consisting of 5,799 questions, including direct attacks, indirect attacks, and harmless requests with sensitive words.
arXiv Detail & Related papers (2024-10-22T14:12:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.