Related papers: BharatBBQ: A Multilingual Bias Benchmark for Question Answering in the Indian Context

BharatBBQ: A Multilingual Bias Benchmark for Question Answering in the Indian Context

URL: http://arxiv.org/abs/2508.07090v1
Date: Sat, 09 Aug 2025 20:24:24 GMT
Title: BharatBBQ: A Multilingual Bias Benchmark for Question Answering in the Indian Context
Authors: Aditya Tomar, Nihar Ranjan Sahoo, Pushpak Bhattacharyya,
Abstract summary: Existing benchmarks, such as the Bias Benchmark for Question Answering (BBQ), primarily focus on Western contexts.<n>We introduce BharatBBQ, a culturally adapted benchmark designed to assess biases in Hindi, English, Marathi, Bengali, Tamil, Telugu, Odia, and Assamese.<n>Our dataset contains 49,108 examples in one language that are expanded using translation and verification to 392,864 examples in eight different languages.
Score: 36.56689822791777
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Evaluating social biases in language models (LMs) is crucial for ensuring fairness and minimizing the reinforcement of harmful stereotypes in AI systems. Existing benchmarks, such as the Bias Benchmark for Question Answering (BBQ), primarily focus on Western contexts, limiting their applicability to the Indian context. To address this gap, we introduce BharatBBQ, a culturally adapted benchmark designed to assess biases in Hindi, English, Marathi, Bengali, Tamil, Telugu, Odia, and Assamese. BharatBBQ covers 13 social categories, including 3 intersectional groups, reflecting prevalent biases in the Indian sociocultural landscape. Our dataset contains 49,108 examples in one language that are expanded using translation and verification to 392,864 examples in eight different languages. We evaluate five multilingual LM families across zero and few-shot settings, analyzing their bias and stereotypical bias scores. Our findings highlight persistent biases across languages and social categories and often amplified biases in Indian languages compared to English, demonstrating the necessity of linguistically and culturally grounded benchmarks for bias evaluation.

Related papers

PakBBQ: A Culturally Adapted Bias Benchmark for QA [3.4455728937232597]
We introduce PakBBQ, a culturally and regionally adapted extension of the original Bias Benchmark for Question Answering dataset.<n> PakBBQ comprises over 214 templates, 17180 QA pairs across 8 categories in both English and Urdu, covering eight bias dimensions including age, disability, appearance, gender, socio-economic status, religious, regional affiliation, and language formality that are relevant in Pakistan.
arXiv Detail & Related papers (2025-08-13T20:42:44Z)
Beyond Early-Token Bias: Model-Specific and Language-Specific Position Effects in Multilingual LLMs [50.07451351559251]
We present a study across five typologically distinct languages (English, Russian, German, Hindi, and Vietnamese)<n>We examine how position bias interacts with prompt strategies and affects output entropy.
arXiv Detail & Related papers (2025-05-22T02:23:00Z)
See It from My Perspective: How Language Affects Cultural Bias in Image Understanding [60.70852566256668]
Vision-language models (VLMs) can respond to queries about images in many languages.<n>We characterize the Western bias of VLMs in image understanding and investigate the role that language plays in this disparity.
arXiv Detail & Related papers (2024-06-17T15:49:51Z)
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark [68.21939124278065]
Culturally-diverse multilingual Visual Question Answering benchmark designed to cover a rich set of languages and cultures. CVQA includes culturally-driven images and questions from across 30 countries on four continents, covering 31 languages with 13 scripts, providing a total of 10k questions. We benchmark several Multimodal Large Language Models (MLLMs) on CVQA, and show that the dataset is challenging for the current state-of-the-art models.
arXiv Detail & Related papers (2024-06-10T01:59:00Z)
IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context [32.48196952339581]
We introduce IndiBias, a benchmark dataset for evaluating social biases in the Indian context. The included bias dimensions encompass gender, religion, caste, age, region, physical appearance, and occupation. Our dataset contains 800 sentence pairs and 300s for bias measurement across different demographics.
arXiv Detail & Related papers (2024-03-29T12:32:06Z)
Social Bias Probing: Fairness Benchmarking for Language Models [38.180696489079985]
This paper proposes a novel framework for probing language models for social biases by assessing disparate treatment. We curate SoFa, a large-scale benchmark designed to address the limitations of existing fairness collections. We show that biases within language models are more nuanced than acknowledged, indicating a broader scope of encoded biases than previously recognized.
arXiv Detail & Related papers (2023-11-15T16:35:59Z)
Global Voices, Local Biases: Socio-Cultural Prejudices across Languages [22.92083941222383]
Human biases are ubiquitous but not uniform; disparities exist across linguistic, cultural, and societal borders. In this work, we scale the Word Embedding Association Test (WEAT) to 24 languages, enabling broader studies. To encompass more widely prevalent societal biases, we examine new bias dimensions across toxicity, ableism, and more.
arXiv Detail & Related papers (2023-10-26T17:07:50Z)
KoBBQ: Korean Bias Benchmark for Question Answering [28.091808407408823]
The Bias Benchmark for Question Answering (BBQ) is designed to evaluate social biases of language models (LMs) We present KoBBQ, a Korean bias benchmark dataset. We propose a general framework that addresses considerations for cultural adaptation of a dataset.
arXiv Detail & Related papers (2023-07-31T15:44:15Z)
Comparing Biases and the Impact of Multilingual Training across Multiple Languages [70.84047257764405]
We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task. We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender. Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture.
arXiv Detail & Related papers (2023-05-18T18:15:07Z)
Socially Aware Bias Measurements for Hindi Language Representations [38.40818373580979]
We show how biases are unique to specific language representations based on the history and culture of the region they are widely spoken in. We emphasize on the necessity of social-awareness along with linguistic and grammatical artefacts when modeling language representations.
arXiv Detail & Related papers (2021-10-15T05:49:15Z)
Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications. We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.