Related papers: BanStereoSet: A Dataset to Measure Stereotypical Social Biases in LLMs for Bangla

BanStereoSet: A Dataset to Measure Stereotypical Social Biases in LLMs for Bangla

URL: http://arxiv.org/abs/2409.11638v1
Date: Wed, 18 Sep 2024 02:02:30 GMT
Title: BanStereoSet: A Dataset to Measure Stereotypical Social Biases in LLMs for Bangla
Authors: Mahammed Kamruzzaman, Abdullah Al Monsur, Shrabon Das, Enamul Hassan, Gene Louis Kim,
Abstract summary: This study presents BanStereoSet, a dataset designed to evaluate stereotypical social biases in multilingual LLMs for the Bangla language. Our dataset consists of 1,194 sentences spanning 9 categories of bias: race, profession, gender, ageism, beauty in profession, region, caste, and religion.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This study presents BanStereoSet, a dataset designed to evaluate stereotypical social biases in multilingual LLMs for the Bangla language. In an effort to extend the focus of bias research beyond English-centric datasets, we have localized the content from the StereoSet, IndiBias, and Kamruzzaman et. al.'s datasets, producing a resource tailored to capture biases prevalent within the Bangla-speaking community. Our BanStereoSet dataset consists of 1,194 sentences spanning 9 categories of bias: race, profession, gender, ageism, beauty, beauty in profession, region, caste, and religion. This dataset not only serves as a crucial tool for measuring bias in multilingual LLMs but also facilitates the exploration of stereotypical bias across different social categories, potentially guiding the development of more equitable language technologies in Bangladeshi contexts. Our analysis of several language models using this dataset indicates significant biases, reinforcing the necessity for culturally and linguistically adapted datasets to develop more equitable language technologies.

Related papers

How do datasets, developers, and models affect biases in a low-resourced language? [10.29144011101804]
We conducted an algorithmic audit of sentiment analysis models built on mBERT and BanglaBERT.<n>Our analyses showed that BSA models exhibit biases across different identity categories despite having similar semantic content and structure.<n>We also examined the inconsistencies and uncertainties arising from combining pre-trained models and datasets created by individuals from diverse demographic backgrounds.
arXiv Detail & Related papers (2025-06-07T14:46:35Z)
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks [68.33068005789116]
We present the first study aimed at objectively assessing the fairness and robustness of Large Language Models (LLMs) in handling dialects in canonical reasoning tasks. We hire AAVE speakers, including experts with computer science backgrounds, to rewrite seven popular benchmarks, such as HumanEval and GSM8K. Our findings reveal that textbfalmost all of these widely used models show significant brittleness and unfairness to queries in AAVE.
arXiv Detail & Related papers (2024-10-14T18:44:23Z)
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs) By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases. The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z)
MBBQ: A Dataset for Cross-Lingual Comparison of Stereotypes in Generative LLMs [6.781972039785424]
Generative large language models (LLMs) have been shown to exhibit harmful biases and stereotypes. We present MBBQ, a dataset that measures stereotypes commonly held across Dutch, Spanish, and Turkish languages. Our results confirm that some non-English languages suffer from bias more than English, even when controlling for cultural shifts.
arXiv Detail & Related papers (2024-06-11T13:23:14Z)
IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context [32.48196952339581]
We introduce IndiBias, a benchmark dataset for evaluating social biases in the Indian context. The included bias dimensions encompass gender, religion, caste, age, region, physical appearance, and occupation. Our dataset contains 800 sentence pairs and 300s for bias measurement across different demographics.
arXiv Detail & Related papers (2024-03-29T12:32:06Z)
RuBia: A Russian Language Bias Detection Dataset [3.8501658629243076]
We present a bias detection dataset specifically designed for the Russian language, dubbed as RuBia. The RuBia dataset is divided into 4 domains: gender, nationality, socio-economic status, and diverse. There are nearly 2,000 unique sentence pairs spread over 19 in RuBia.
arXiv Detail & Related papers (2024-03-26T10:01:01Z)
Natural Language Processing for Dialects of a Language: A Survey [56.93337350526933]
State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets. This survey delves into an important attribute of these datasets: the dialect of a language. Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.
arXiv Detail & Related papers (2024-01-11T03:04:38Z)
Global Voices, Local Biases: Socio-Cultural Prejudices across Languages [22.92083941222383]
Human biases are ubiquitous but not uniform; disparities exist across linguistic, cultural, and societal borders. In this work, we scale the Word Embedding Association Test (WEAT) to 24 languages, enabling broader studies. To encompass more widely prevalent societal biases, we examine new bias dimensions across toxicity, ableism, and more.
arXiv Detail & Related papers (2023-10-26T17:07:50Z)
Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z)
NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages [54.808217147579036]
We conduct a case study on Indonesian local languages. We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets. Our findings demonstrate that datasets generated through paragraph writing by native speakers exhibit superior quality in terms of lexical diversity and cultural content.
arXiv Detail & Related papers (2023-09-19T14:42:33Z)
Cross-Lingual NER for Financial Transaction Data in Low-Resource Languages [70.25418443146435]
We propose an efficient modeling framework for cross-lingual named entity recognition in semi-structured text data. We employ two independent datasets of SMSs in English and Arabic, each carrying semi-structured banking transaction information. With access to only 30 labeled samples, our model can generalize the recognition of merchants, amounts, and other fields from English to Arabic.
arXiv Detail & Related papers (2023-07-16T00:45:42Z)
Multi-lingual and Multi-cultural Figurative Language Understanding [69.47641938200817]
Figurative language permeates human communication, but is relatively understudied in NLP. We create a dataset for seven diverse languages associated with a variety of cultures: Hindi, Indonesian, Javanese, Kannada, Sundanese, Swahili and Yoruba. Our dataset reveals that each language relies on cultural and regional concepts for figurative expressions, with the highest overlap between languages originating from the same region. All languages exhibit a significant deficiency compared to English, with variations in performance reflecting the availability of pre-training and fine-tuning data.
arXiv Detail & Related papers (2023-05-25T15:30:31Z)
An Analysis of Social Biases Present in BERT Variants Across Multiple Languages [0.0]
We investigate the bias present in monolingual BERT models across a diverse set of languages. We propose a template-based method to measure any kind of bias, based on sentence pseudo-likelihood. We conclude that current methods of probing for bias are highly language-dependent.
arXiv Detail & Related papers (2022-11-25T23:38:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.