Related papers: SAFARI: A Community-Engaged Approach and Dataset of Stereotype Resources in the Sub-Saharan African Context

SAFARI: A Community-Engaged Approach and Dataset of Stereotype Resources in the Sub-Saharan African Context

URL: http://arxiv.org/abs/2602.22404v1
Date: Wed, 25 Feb 2026 20:56:27 GMT
Title: SAFARI: A Community-Engaged Approach and Dataset of Stereotype Resources in the Sub-Saharan African Context
Authors: Aishwarya Verma, Laud Ammah, Olivia Nercy Ndlovu Lucas, Andrew Zaldivar, Vinodkumar Prabhakaran, Sunipa Dev,
Abstract summary: Stereotype repositories are critical to assess generative AI model safety, but currently lack adequate global coverage.<n>This work introduces a multilingual stereotype resource covering four sub-Saharan African countries that are severely underrepresented in NLP resources: Ghana, Kenya, Nigeria, and South Africa.
Score: 10.43559852429736
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Stereotype repositories are critical to assess generative AI model safety, but currently lack adequate global coverage. It is imperative to prioritize targeted expansion, strategically addressing existing deficits, over merely increasing data volume. This work introduces a multilingual stereotype resource covering four sub-Saharan African countries that are severely underrepresented in NLP resources: Ghana, Kenya, Nigeria, and South Africa. By utilizing socioculturally-situated, community-engaged methods, including telephonic surveys moderated in native languages, we establish a reproducible methodology that is sensitive to the region's complex linguistic diversity and traditional orality. By deliberately balancing the sample across diverse ethnic and demographic backgrounds, we ensure broad coverage, resulting in a dataset of 3,534 stereotypes in English and 3,206 stereotypes across 15 native languages.

Related papers

Surfacing Subtle Stereotypes: A Multilingual, Debate-Oriented Evaluation of Modern LLMs [32.12545369011503]
We introduce DebateBias-8K, a new multilingual, debate-style benchmark to reveal how narrative bias appears in realistic generative settings.<n>Our dataset includes 8,400 structured debate prompts spanning four sensitive domains: women's rights, socioeconomic development, terrorism, and religion, across seven languages.<n>Results show that all models reproduce entrenched stereotypes despite safety alignment.
arXiv Detail & Related papers (2025-11-03T03:25:40Z)
Adaptive Data Collection for Latin-American Community-sourced Evaluation of Stereotypes (LACES) [12.636379779655558]
The evaluation of societal biases in NLP models is critically hindered by a glaring geo-cultural gap.<n>Existing benchmarks are overwhelmingly English-centric and focused on U.S. demographics.<n>We introduce a new, large-scale dataset of stereotypes developed through targeted community partnerships within Latin America.
arXiv Detail & Related papers (2025-10-28T20:42:14Z)
MMA-ASIA: A Multilingual and Multimodal Alignment Framework for Culturally-Grounded Evaluation [91.22008265721952]
MMA-ASIA centers on a human-curated, multilingual, and multimodally aligned benchmark covering 8 Asian countries and 10 languages.<n>This is the first dataset aligned at the input level across three modalities: text, image (visual question answering), and speech.<n>We propose a five-dimensional evaluation protocol that measures: (i) cultural-awareness disparities across countries, (ii) cross-lingual consistency, (iii) cross-modal consistency, (iv) cultural knowledge generalization, and (v) grounding validity.
arXiv Detail & Related papers (2025-10-07T14:12:12Z)
Where Are We? Evaluating LLM Performance on African Languages [16.206469767073155]
Africa's rich linguistic heritage remains underrepresented in NLP.<n>This paper integrates theoretical insights on Africa's language landscape with an empirical evaluation using Sahara.
arXiv Detail & Related papers (2025-02-26T21:49:54Z)
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages [77.75535024869224]
We present SeaLLMs 3, the latest iteration of the SeaLLMs model family, tailored for Southeast Asian languages. SeaLLMs 3 aims to bridge this gap by covering a comprehensive range of languages spoken in this region, including English, Chinese, Indonesian, Vietnamese, Thai, Tagalog, Malay, Burmese, Khmer, Lao, Tamil, and Javanese. Our model excels in tasks such as world knowledge, mathematical reasoning, translation, and instruction following, achieving state-of-the-art performance among similarly sized models.
arXiv Detail & Related papers (2024-07-29T03:26:22Z)
SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages [64.10040374077994]
We introduce SEACrowd, a collaborative initiative that consolidates standardized corpora in nearly 1,000 languages across three modalities.<n>We assess the quality of AI models on 36 indigenous languages across 13 tasks, offering valuable insights into the current AI landscape in SEA.
arXiv Detail & Related papers (2024-06-14T15:23:39Z)
SeeGULL Multilingual: a Dataset of Geo-Culturally Situated Stereotypes [18.991295993710224]
SeeGULL is a global-scale multilingual dataset of social stereotypes, spanning 20 languages, with human annotations across 23 regions, and demonstrate its utility in identifying gaps in model evaluations.
arXiv Detail & Related papers (2024-03-08T22:09:58Z)
Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking [48.21982147529661]
This paper introduces a novel approach for massively multicultural knowledge acquisition. Our method strategically navigates from densely informative Wikipedia documents on cultural topics to an extensive network of linked pages. Our work marks an important step towards deeper understanding and bridging the gaps of cultural disparities in AI.
arXiv Detail & Related papers (2024-02-14T18:16:54Z)
NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages [54.808217147579036]
We conduct a case study on Indonesian local languages. We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets. Our findings demonstrate that datasets generated through paragraph writing by native speakers exhibit superior quality in terms of lexical diversity and cultural content.
arXiv Detail & Related papers (2023-09-19T14:42:33Z)
MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition [55.95128479289923]
African languages are spoken by over a billion people, but are underrepresented in NLP research and development. We create the largest human-annotated NER dataset for 20 African languages. We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points.
arXiv Detail & Related papers (2022-10-22T08:53:14Z)
Evaluating the Diversity, Equity and Inclusion of NLP Technology: A Case Study for Indian Languages [35.86100962711644]
In order for NLP technology to be widely applicable, fair, and useful, it needs to serve a diverse set of speakers across the world's languages. We propose an evaluation paradigm that assesses NLP technologies across all three dimensions.
arXiv Detail & Related papers (2022-05-25T11:38:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.