Related papers: IndoSafety: Culturally Grounded Safety for LLMs in Indonesian Languages

IndoSafety: Culturally Grounded Safety for LLMs in Indonesian Languages

URL: http://arxiv.org/abs/2506.02573v1
Date: Tue, 03 Jun 2025 07:53:55 GMT
Title: IndoSafety: Culturally Grounded Safety for LLMs in Indonesian Languages
Authors: Muhammad Falensi Azmi, Muhammad Dehan Al Kautsar, Alfan Farizki Wicaksono, Fajri Koto,
Abstract summary: We present IndoSafety, the first high-quality, human-verified safety evaluation dataset tailored for the Indonesian context.<n>IndoSafety is constructed by extending prior safety frameworks to develop a taxonomy that captures Indonesia's sociocultural context.
Score: 6.4212082894269535
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Although region-specific large language models (LLMs) are increasingly developed, their safety remains underexplored, particularly in culturally diverse settings like Indonesia, where sensitivity to local norms is essential and highly valued by the community. In this work, we present IndoSafety, the first high-quality, human-verified safety evaluation dataset tailored for the Indonesian context, covering five language varieties: formal and colloquial Indonesian, along with three major local languages: Javanese, Sundanese, and Minangkabau. IndoSafety is constructed by extending prior safety frameworks to develop a taxonomy that captures Indonesia's sociocultural context. We find that existing Indonesian-centric LLMs often generate unsafe outputs, particularly in colloquial and local language settings, while fine-tuning on IndoSafety significantly improves safety while preserving task performance. Our work highlights the critical need for culturally grounded safety evaluation and provides a concrete step toward responsible LLM deployment in multilingual settings. Warning: This paper contains example data that may be offensive, harmful, or biased.

Related papers

Bridging the Multilingual Safety Divide: Efficient, Culturally-Aware Alignment for Global South Languages [8.667909336164465]
Large language models (LLMs) are being deployed across the Global South.<n> Everyday use involves low-resource languages, code-mixing, and culturally specific norms.<n>Our aim is to make multilingual safety a core requirement-not an add-on-for equitable AI in underrepresented regions.
arXiv Detail & Related papers (2026-02-14T19:56:40Z)
SEA-Guard: Culturally Grounded Multilingual Safeguard for Southeast Asia [36.95168918567729]
Building large-scale, culturally grounded datasets is challenging due to limited resources and a scarcity of native annotators.<n>We present a novel agentic data-generation framework to scalably create authentic, region-specific safety datasets for Southeast Asia.<n>We introduce the SEA-Guard family, the first multilingual safeguard models grounded in SEA cultural contexts.
arXiv Detail & Related papers (2026-02-02T04:20:35Z)
UbuntuGuard: A Culturally-Grounded Policy Benchmark for Equitable AI Safety in African Languages [18.40701733030824]
Current guardian models are predominantly Western-centric and optimized for high-resource languages.<n>We introduce UbuntuGuard, the first African policy-based safety benchmark built from adversarial queries authored by 155 domain experts.
arXiv Detail & Related papers (2026-01-19T03:37:56Z)
SEA-SafeguardBench: Evaluating AI Safety in SEA Languages and Cultures [36.95168918567729]
Existing multilingual safety benchmarks often rely on machine-translated English data.<n>We introduce SEA-SafeguardBench, the first human-verified safety benchmark for SEA.<n>It covers eight languages, 21,640 samples, across three subsets: general, in-the-wild, and content generation.
arXiv Detail & Related papers (2025-12-05T07:57:57Z)
Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore's Low-Resource Languages [57.059267233093465]
Large Language Models (LLMs) have transformed natural language processing, but their safety mechanisms remain under-explored in low-resource, multilingual settings.<n>We introduce textsfSGToxicGuard, a novel dataset and evaluation framework for benchmarking LLM safety in Singapore's diverse linguistic context.<n>We conduct extensive experiments with state-of-the-art multilingual LLMs, and the results uncover critical gaps in their safety guardrails.
arXiv Detail & Related papers (2025-09-18T08:14:34Z)
CultureGuard: Towards Culturally-Aware Dataset and Guard Model for Multilingual Safety Applications [1.235687336222824]
We present CultureGuard, a novel solution for curating culturally aligned, high-quality safety datasets across multiple languages.<n>Our approach introduces a four-stage synthetic data generation and filtering pipeline: cultural data segregation, cultural data adaptation, machine translation, and quality filtering.<n>The resulting dataset, Nemotron-Content-Safety-Dataset-Multilingual-v1, comprises 386,661 samples in 9 languages and facilitates the training of Llama-3.1-Nemotron-Safety-Guard-Multilingual-8B-v1 via LoRA-based fine-tuning.
arXiv Detail & Related papers (2025-08-03T10:35:05Z)
MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety [56.79292318645454]
Large Language Models (LLMs) are susceptible to adversarial attacks such as jailbreaking.<n>This vulnerability is exacerbated in multilingual settings, where multilingual safety-aligned data is often limited.<n>We introduce a multilingual guardrail with reasoning for prompt classification.
arXiv Detail & Related papers (2025-04-21T17:15:06Z)
Qorgau: Evaluating LLM Safety in Kazakh-Russian Bilingual Contexts [40.0358736497799]
Large language models (LLMs) are known to have the potential to generate harmful content.<n>This paper introduces Qorgau, a novel dataset specifically designed for safety evaluation in Kazakh and Russian.
arXiv Detail & Related papers (2025-02-19T11:33:22Z)
LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Inconsistencies [63.10843814055688]
M-ALERT is a benchmark that evaluates the safety of Large Language Models in five languages.<n>M-ALERT includes 15k high-quality prompts per language, totaling 75k, with category-wise annotations.<n>Our experiments on 39 state-of-the-art LLMs highlight the importance of language-specific safety analysis.
arXiv Detail & Related papers (2024-12-19T16:46:54Z)
SafeWorld: Geo-Diverse Safety Alignment [107.84182558480859]
We introduce SafeWorld, a novel benchmark specifically designed to evaluate Large Language Models (LLMs)<n>SafeWorld encompasses 2,342 test user queries, each grounded in high-quality, human-verified cultural norms and legal policies from 50 countries and 493 regions/races.<n>Our trained SafeWorldLM outperforms all competing models, including GPT-4o on all three evaluation dimensions by a large margin.
arXiv Detail & Related papers (2024-12-09T13:31:46Z)
Arabic Dataset for LLM Safeguard Evaluation [62.96160492994489]
This study explores the safety of large language models (LLMs) in Arabic with its linguistic and cultural complexities.<n>We present an Arab-region-specific safety evaluation dataset consisting of 5,799 questions, including direct attacks, indirect attacks, and harmless requests with sensitive words.
arXiv Detail & Related papers (2024-10-22T14:12:43Z)
Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages [55.963648108438555]
Large language models (LLMs) show remarkable human-like capability in various domains and languages. We introduce Cendol, a collection of Indonesian LLMs encompassing both decoder-only and encoder-decoder architectures. We highlight Cendol's effectiveness across a diverse array of tasks, attaining 20% improvement, and demonstrate its capability to generalize.
arXiv Detail & Related papers (2024-04-09T09:04:30Z)
All Languages Matter: On the Multilingual Safety of Large Language Models [96.47607891042523]
We build the first multilingual safety benchmark for large language models (LLMs) XSafety covers 14 kinds of commonly used safety issues across 10 languages that span several language families. We propose several simple and effective prompting methods to improve the multilingual safety of ChatGPT.
arXiv Detail & Related papers (2023-10-02T05:23:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.