IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context
- URL: http://arxiv.org/abs/2403.20147v2
- Date: Wed, 3 Apr 2024 11:59:19 GMT
- Title: IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context
- Authors: Nihar Ranjan Sahoo, Pranamya Prashant Kulkarni, Narjis Asad, Arif Ahmad, Tanu Goyal, Aparna Garimella, Pushpak Bhattacharyya,
- Abstract summary: We introduce IndiBias, a benchmark dataset for evaluating social biases in the Indian context.
The included bias dimensions encompass gender, religion, caste, age, region, physical appearance, and occupation.
Our dataset contains 800 sentence pairs and 300s for bias measurement across different demographics.
- Score: 32.48196952339581
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The pervasive influence of social biases in language data has sparked the need for benchmark datasets that capture and evaluate these biases in Large Language Models (LLMs). Existing efforts predominantly focus on English language and the Western context, leaving a void for a reliable dataset that encapsulates India's unique socio-cultural nuances. To bridge this gap, we introduce IndiBias, a comprehensive benchmarking dataset designed specifically for evaluating social biases in the Indian context. We filter and translate the existing CrowS-Pairs dataset to create a benchmark dataset suited to the Indian context in Hindi language. Additionally, we leverage LLMs including ChatGPT and InstructGPT to augment our dataset with diverse societal biases and stereotypes prevalent in India. The included bias dimensions encompass gender, religion, caste, age, region, physical appearance, and occupation. We also build a resource to address intersectional biases along three intersectional dimensions. Our dataset contains 800 sentence pairs and 300 tuples for bias measurement across different demographics. The dataset is available in English and Hindi, providing a size comparable to existing benchmark datasets. Furthermore, using IndiBias we compare ten different language models on multiple bias measurement metrics. We observed that the language models exhibit more bias across a majority of the intersectional groups.
Related papers
- BanStereoSet: A Dataset to Measure Stereotypical Social Biases in LLMs for Bangla [0.0]
This study presents BanStereoSet, a dataset designed to evaluate stereotypical social biases in multilingual LLMs for the Bangla language.
Our dataset consists of 1,194 sentences spanning 9 categories of bias: race, profession, gender, ageism, beauty in profession, region, caste, and religion.
arXiv Detail & Related papers (2024-09-18T02:02:30Z) - Navigating Text-to-Image Generative Bias across Indic Languages [53.92640848303192]
This research investigates biases in text-to-image (TTI) models for the Indic languages widely spoken across India.
It evaluates and compares the generative performance and cultural relevance of leading TTI models in these languages against their performance in English.
arXiv Detail & Related papers (2024-08-01T04:56:13Z) - Multilingual Diversity Improves Vision-Language Representations [66.41030381363244]
Pre-training on this dataset outperforms using English-only or English-dominated datasets on ImageNet.
On a geographically diverse task like GeoDE, we also observe improvements across all regions, with the biggest gain coming from Africa.
arXiv Detail & Related papers (2024-05-27T08:08:51Z) - RuBia: A Russian Language Bias Detection Dataset [3.8501658629243076]
We present a bias detection dataset specifically designed for the Russian language, dubbed as RuBia.
The RuBia dataset is divided into 4 domains: gender, nationality, socio-economic status, and diverse.
There are nearly 2,000 unique sentence pairs spread over 19 in RuBia.
arXiv Detail & Related papers (2024-03-26T10:01:01Z) - Towards Better Inclusivity: A Diverse Tweet Corpus of English Varieties [0.0]
We aim to address the issue of bias at its root - the data itself.
We curate a dataset of tweets from countries with high proportions of underserved English variety speakers.
Following best annotation practices, our growing corpus features 170,800 tweets taken from 7 countries.
arXiv Detail & Related papers (2024-01-21T13:18:20Z) - Global Voices, Local Biases: Socio-Cultural Prejudices across Languages [22.92083941222383]
Human biases are ubiquitous but not uniform; disparities exist across linguistic, cultural, and societal borders.
In this work, we scale the Word Embedding Association Test (WEAT) to 24 languages, enabling broader studies.
To encompass more widely prevalent societal biases, we examine new bias dimensions across toxicity, ableism, and more.
arXiv Detail & Related papers (2023-10-26T17:07:50Z) - CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI
Collaboration for Large Language Models [52.25049362267279]
We present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models.
The testing instances in the dataset are automatically derived from 3K+ high-quality templates manually authored with stringent quality control.
Extensive experiments demonstrate the effectiveness of the dataset in detecting model bias, with all 10 publicly available Chinese large language models exhibiting strong bias in certain categories.
arXiv Detail & Related papers (2023-06-28T14:14:44Z) - XRICL: Cross-lingual Retrieval-Augmented In-Context Learning for
Cross-lingual Text-to-SQL Semantic Parsing [70.40401197026925]
In-context learning using large language models has recently shown surprising results for semantic parsing tasks.
This work introduces the XRICL framework, which learns to retrieve relevant English exemplars for a given query.
We also include global translation exemplars for a target language to facilitate the translation process for large language models.
arXiv Detail & Related papers (2022-10-25T01:33:49Z) - A Data Bootstrapping Recipe for Low Resource Multilingual Relation
Classification [38.83366564843953]
IndoRE is a dataset with 21K entity and relation tagged gold sentences in three Indian languages, plus English.
We start with a multilingual BERT (mBERT) based system that captures entity span positions and type information.
We study the accuracy efficiency tradeoff between expensive gold instances vs. translated and aligned'silver' instances.
arXiv Detail & Related papers (2021-10-18T18:40:46Z) - REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets [64.76453161039973]
REVISE (REvealing VIsual biaSEs) is a tool that assists in the investigation of a visual dataset.
It surfacing potential biases along three dimensions: (1) object-based, (2) person-based, and (3) geography-based.
arXiv Detail & Related papers (2020-04-16T23:54:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.