Related papers: UbuntuGuard: A Culturally-Grounded Policy Benchmark for Equitable AI Safety in African Languages

UbuntuGuard: A Culturally-Grounded Policy Benchmark for Equitable AI Safety in African Languages

URL: http://arxiv.org/abs/2601.12696v1
Date: Mon, 19 Jan 2026 03:37:56 GMT
Title: UbuntuGuard: A Culturally-Grounded Policy Benchmark for Equitable AI Safety in African Languages
Authors: Tassallah Abdullahi, Macton Mgonzo, Mardiyyah Oduwole, Paul Okewunmi, Abraham Owodunni, Ritambhara Singh, Carsten Eickhoff,
Abstract summary: Current guardian models are predominantly Western-centric and optimized for high-resource languages.<n>We introduce UbuntuGuard, the first African policy-based safety benchmark built from adversarial queries authored by 155 domain experts.
Score: 18.40701733030824
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Current guardian models are predominantly Western-centric and optimized for high-resource languages, leaving low-resource African languages vulnerable to evolving harms, cross-lingual safety failures, and cultural misalignment. Moreover, most guardian models rely on rigid, predefined safety categories that fail to generalize across diverse linguistic and sociocultural contexts. Robust safety, therefore, requires flexible, runtime-enforceable policies and benchmarks that reflect local norms, harm scenarios, and cultural expectations. We introduce UbuntuGuard, the first African policy-based safety benchmark built from adversarial queries authored by 155 domain experts across sensitive fields, including healthcare. From these expert-crafted queries, we derive context-specific safety policies and reference responses that capture culturally grounded risk signals, enabling policy-aligned evaluation of guardian models. We evaluate 13 models, comprising six general-purpose LLMs and seven guardian models across three distinct variants: static, dynamic, and multilingual. Our findings reveal that existing English-centric benchmarks overestimate real-world multilingual safety, cross-lingual transfer provides partial but insufficient coverage, and dynamic models, while better equipped to leverage policies at inference time, still struggle to fully localize African-language contexts. These findings highlight the urgent need for multilingual, culturally grounded safety benchmarks to enable the development of reliable and equitable guardian models for low-resource languages. Our code can be found online.\footnote{Code repository available at https://github.com/hemhemoh/UbuntuGuard.

Related papers

Bridging the Multilingual Safety Divide: Efficient, Culturally-Aware Alignment for Global South Languages [8.667909336164465]
Large language models (LLMs) are being deployed across the Global South.<n> Everyday use involves low-resource languages, code-mixing, and culturally specific norms.<n>Our aim is to make multilingual safety a core requirement-not an add-on-for equitable AI in underrepresented regions.
arXiv Detail & Related papers (2026-02-14T19:56:40Z)
Improving Methodologies for LLM Evaluations Across Global Languages [19.63570354411416]
The exercise shows how safety behaviours can vary across languages.<n>It also generated insights for improving multilingual safety evaluations.<n>This work represents an initial step toward a shared framework for multilingual safety testing of advanced AI systems.
arXiv Detail & Related papers (2026-01-22T07:18:08Z)
SEA-SafeguardBench: Evaluating AI Safety in SEA Languages and Cultures [36.95168918567729]
Existing multilingual safety benchmarks often rely on machine-translated English data.<n>We introduce SEA-SafeguardBench, the first human-verified safety benchmark for SEA.<n>It covers eight languages, 21,640 samples, across three subsets: general, in-the-wild, and content generation.
arXiv Detail & Related papers (2025-12-05T07:57:57Z)
Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore's Low-Resource Languages [57.059267233093465]
Large Language Models (LLMs) have transformed natural language processing, but their safety mechanisms remain under-explored in low-resource, multilingual settings.<n>We introduce textsfSGToxicGuard, a novel dataset and evaluation framework for benchmarking LLM safety in Singapore's diverse linguistic context.<n>We conduct extensive experiments with state-of-the-art multilingual LLMs, and the results uncover critical gaps in their safety guardrails.
arXiv Detail & Related papers (2025-09-18T08:14:34Z)
LinguaSafe: A Comprehensive Multilingual Safety Benchmark for Large Language Models [22.273388934888278]
Our dataset comprises 45k entries in 12 languages, ranging from Hungarian to Malay.<n>Our benchmark provides a comprehensive suite of metrics for in-depth safety evaluation.
arXiv Detail & Related papers (2025-08-18T08:59:01Z)
MrGuard: A Multilingual Reasoning Guardrail for Universal LLM Safety [56.77103365251923]
Large Language Models (LLMs) are susceptible to adversarial attacks such as jailbreaking.<n>This vulnerability is exacerbated in multilingual settings, where multilingual safety-aligned data is often limited.<n>We introduce a multilingual guardrail with reasoning for prompt classification.
arXiv Detail & Related papers (2025-04-21T17:15:06Z)
LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Inconsistencies [63.10843814055688]
M-ALERT is a benchmark that evaluates the safety of Large Language Models in five languages.<n>M-ALERT includes 15k high-quality prompts per language, totaling 75k, with category-wise annotations.<n>Our experiments on 39 state-of-the-art LLMs highlight the importance of language-specific safety analysis.
arXiv Detail & Related papers (2024-12-19T16:46:54Z)
Arabic Dataset for LLM Safeguard Evaluation [62.96160492994489]
This study explores the safety of large language models (LLMs) in Arabic with its linguistic and cultural complexities.<n>We present an Arab-region-specific safety evaluation dataset consisting of 5,799 questions, including direct attacks, indirect attacks, and harmless requests with sensitive words.
arXiv Detail & Related papers (2024-10-22T14:12:43Z)
ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming [64.86326523181553]
ALERT is a large-scale benchmark to assess safety based on a novel fine-grained risk taxonomy. It aims to identify vulnerabilities, inform improvements, and enhance the overall safety of the language models.
arXiv Detail & Related papers (2024-04-06T15:01:47Z)
All Languages Matter: On the Multilingual Safety of Large Language Models [96.47607891042523]
We build the first multilingual safety benchmark for large language models (LLMs) XSafety covers 14 kinds of commonly used safety issues across 10 languages that span several language families. We propose several simple and effective prompting methods to improve the multilingual safety of ChatGPT.
arXiv Detail & Related papers (2023-10-02T05:23:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.