SEA-Guard: Culturally Grounded Multilingual Safeguard for Southeast Asia
- URL: http://arxiv.org/abs/2602.01618v1
- Date: Mon, 02 Feb 2026 04:20:35 GMT
- Title: SEA-Guard: Culturally Grounded Multilingual Safeguard for Southeast Asia
- Authors: Panuthep Tasawong, Jian Gang Ngui, Alham Fikri Aji, Trevor Cohn, Peerat Limkonchotiwat,
- Abstract summary: Building large-scale, culturally grounded datasets is challenging due to limited resources and a scarcity of native annotators.<n>We present a novel agentic data-generation framework to scalably create authentic, region-specific safety datasets for Southeast Asia.<n>We introduce the SEA-Guard family, the first multilingual safeguard models grounded in SEA cultural contexts.
- Score: 36.95168918567729
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Culturally aware safeguards are crucial for AI alignment in real-world settings, where safety extends beyond common sense and encompasses diverse local values, norms, and region-specific regulations. However, building large-scale, culturally grounded datasets is challenging due to limited resources and a scarcity of native annotators. Consequently, many safeguard models rely on machine translation of English datasets, often missing regional and cultural nuances. We present a novel agentic data-generation framework to scalably create authentic, region-specific safety datasets for Southeast Asia (SEA). On this foundation, we introduce the SEA-Guard family, the first multilingual safeguard models grounded in SEA cultural contexts. Evaluated across multiple benchmarks and cultural variants, SEA-Guard consistently outperforms existing safeguards at detecting regionally sensitive or harmful content while maintaining strong general safety performance.
Related papers
- Bridging the Multilingual Safety Divide: Efficient, Culturally-Aware Alignment for Global South Languages [8.667909336164465]
Large language models (LLMs) are being deployed across the Global South.<n> Everyday use involves low-resource languages, code-mixing, and culturally specific norms.<n>Our aim is to make multilingual safety a core requirement-not an add-on-for equitable AI in underrepresented regions.
arXiv Detail & Related papers (2026-02-14T19:56:40Z) - UbuntuGuard: A Culturally-Grounded Policy Benchmark for Equitable AI Safety in African Languages [18.40701733030824]
Current guardian models are predominantly Western-centric and optimized for high-resource languages.<n>We introduce UbuntuGuard, the first African policy-based safety benchmark built from adversarial queries authored by 155 domain experts.
arXiv Detail & Related papers (2026-01-19T03:37:56Z) - SEA-SafeguardBench: Evaluating AI Safety in SEA Languages and Cultures [36.95168918567729]
Existing multilingual safety benchmarks often rely on machine-translated English data.<n>We introduce SEA-SafeguardBench, the first human-verified safety benchmark for SEA.<n>It covers eight languages, 21,640 samples, across three subsets: general, in-the-wild, and content generation.
arXiv Detail & Related papers (2025-12-05T07:57:57Z) - AssurAI: Experience with Constructing Korean Socio-cultural Datasets to Discover Potential Risks of Generative AI [50.802995291689086]
We introduce AssurAI, a new quality-controlled Korean multimodal dataset for evaluating the safety of generative AI.<n>We define a taxonomy of 35 distinct AI risk factors, adapted from established frameworks to cover both universal harms and relevance to the Korean socio-cultural context.<n>AssurAI is a large-scale Korean multimodal dataset comprising 11,480 instances across text, image, video, and audio.
arXiv Detail & Related papers (2025-11-20T13:59:42Z) - CultureGuard: Towards Culturally-Aware Dataset and Guard Model for Multilingual Safety Applications [5.151690536714851]
We present CultureGuard, a novel solution for curating culturally aligned, high-quality safety datasets across multiple languages.<n>Our approach introduces a four-stage synthetic data generation and filtering pipeline: cultural data segregation, cultural data adaptation, machine translation, and quality filtering.<n>The resulting dataset, Nemotron-Safety-Guard-Dataset-v3, comprises 386,661 samples in 9 languages and facilitates the training of Llama-3.1-Nemotron-Safety-Guard-8B-v3 via LoRA-based fine-tuning.
arXiv Detail & Related papers (2025-08-03T10:35:05Z) - IndoSafety: Culturally Grounded Safety for LLMs in Indonesian Languages [6.4212082894269535]
We present IndoSafety, the first high-quality, human-verified safety evaluation dataset tailored for the Indonesian context.<n>IndoSafety is constructed by extending prior safety frameworks to develop a taxonomy that captures Indonesia's sociocultural context.
arXiv Detail & Related papers (2025-06-03T07:53:55Z) - Multimodal Cultural Safety: Evaluation Frameworks and Alignment Strategies [58.88053690412802]
Large vision-language models (LVLMs) are increasingly deployed in globally distributed applications, such as tourism assistants.<n> CROSS is a benchmark designed to assess the cultural safety reasoning capabilities of LVLMs.<n>We evaluate 21 leading LVLMs, including mixture-of-experts models and reasoning models.
arXiv Detail & Related papers (2025-05-20T23:20:38Z) - MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety [56.77103365251923]
Large Language Models (LLMs) are susceptible to adversarial attacks such as jailbreaking.<n>This vulnerability is exacerbated in multilingual settings, where multilingual safety-aligned data is often limited.<n>We introduce a multilingual guardrail with reasoning for prompt classification.
arXiv Detail & Related papers (2025-04-21T17:15:06Z) - SafeWorld: Geo-Diverse Safety Alignment [107.84182558480859]
We introduce SafeWorld, a novel benchmark specifically designed to evaluate Large Language Models (LLMs)<n>SafeWorld encompasses 2,342 test user queries, each grounded in high-quality, human-verified cultural norms and legal policies from 50 countries and 493 regions/races.<n>Our trained SafeWorldLM outperforms all competing models, including GPT-4o on all three evaluation dimensions by a large margin.
arXiv Detail & Related papers (2024-12-09T13:31:46Z) - Arabic Dataset for LLM Safeguard Evaluation [62.96160492994489]
This study explores the safety of large language models (LLMs) in Arabic with its linguistic and cultural complexities.<n>We present an Arab-region-specific safety evaluation dataset consisting of 5,799 questions, including direct attacks, indirect attacks, and harmless requests with sensitive words.
arXiv Detail & Related papers (2024-10-22T14:12:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.