Related papers: Probing Gender Bias in Multilingual LLMs: A Case Study of Stereotypes in Persian

Probing Gender Bias in Multilingual LLMs: A Case Study of Stereotypes in Persian

URL: http://arxiv.org/abs/2509.20168v1
Date: Wed, 24 Sep 2025 14:34:17 GMT
Title: Probing Gender Bias in Multilingual LLMs: A Case Study of Stereotypes in Persian
Authors: Ghazal Kalhor, Behnam Bahrak,
Abstract summary: We propose a template-based probing methodology to uncover gender stereotypes in Multilingual Large Language Models (LLMs)<n>We evaluate four prominent models, focusing on Persian, a low-resource language with distinct linguistic features.<n>Our results show that all models exhibit gender stereotypes, with greater disparities in Persian than in English across all domains.<n>This study underscores the need for inclusive NLP practices and provides a framework for assessing bias in other low-resource languages.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multilingual Large Language Models (LLMs) are increasingly used worldwide, making it essential to ensure they are free from gender bias to prevent representational harm. While prior studies have examined such biases in high-resource languages, low-resource languages remain understudied. In this paper, we propose a template-based probing methodology, validated against real-world data, to uncover gender stereotypes in LLMs. As part of this framework, we introduce the Domain-Specific Gender Skew Index (DS-GSI), a metric that quantifies deviations from gender parity. We evaluate four prominent models, GPT-4o mini, DeepSeek R1, Gemini 2.0 Flash, and Qwen QwQ 32B, across four semantic domains, focusing on Persian, a low-resource language with distinct linguistic features. Our results show that all models exhibit gender stereotypes, with greater disparities in Persian than in English across all domains. Among these, sports reflect the most rigid gender biases. This study underscores the need for inclusive NLP practices and provides a framework for assessing bias in other low-resource languages.

Related papers

Exploring Gender Bias in Large Language Models: An In-depth Dive into the German Language [21.87606488958834]
We present five German datasets for gender bias evaluation in large language models (LLMs)<n>The datasets are grounded in well-established concepts of gender bias and are accessible through multiple methodologies.<n>Our findings, reported for eight multilingual LLM models, reveal unique challenges associated with gender bias in German.
arXiv Detail & Related papers (2025-07-22T13:09:41Z)
EuroGEST: Investigating gender stereotypes in multilingual language models [58.871032460235575]
We introduce EuroGEST, a dataset designed to measure gender-stereotypical reasoning in LLMs across English and 29 European languages.<n>We show that the strongest stereotypes in all models across all languages are that women are 'beautiful', 'empathetic' and 'neat' and men are 'leaders','strong, tough' and 'professional'
arXiv Detail & Related papers (2025-06-04T11:58:18Z)
Breaking Language Barriers or Reinforcing Bias? A Study of Gender and Racial Disparities in Multilingual Contrastive Vision Language Models [28.944990804599893]
We perform the first systematic audit of four public multilingual CLIP variants: M-CLIP, NLLB-CLIP, CAPIVARA-CLIP, and the debiased SigLIP-2.<n>We quantify race and gender bias and measure stereotype amplification.
arXiv Detail & Related papers (2025-05-20T10:14:00Z)
Colombian Waitresses y Jueces canadienses: Gender and Country Biases in Occupation Recommendations from LLMs [15.783346695504344]
We study the first study of multilingual intersecting country and gender biases.<n>We construct a benchmark of prompts in English, Spanish and German, using 25 countries and four pronoun sets.<n>We find that even when models show parity for gender or country individually, intersectional occupational biases based on both country and gender persist.
arXiv Detail & Related papers (2025-05-05T08:40:51Z)
GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models [73.23743278545321]
Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but have also been observed to magnify societal biases.<n>GenderCARE is a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics.
arXiv Detail & Related papers (2024-08-22T15:35:46Z)
Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders. This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words) We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z)
GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models. Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs. Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z)
Leveraging Large Language Models to Measure Gender Representation Bias in Gendered Language Corpora [9.959039325564744]
Large language models (LLMs) often inherit and amplify social biases embedded in their training data.<n>Gender bias is the association of specific roles or traits with a particular gender.<n>Gender representation bias is the unequal frequency of references to individuals of different genders.
arXiv Detail & Related papers (2024-06-19T16:30:58Z)
What is Your Favorite Gender, MLM? Gender Bias Evaluation in Multilingual Masked Language Models [8.618945530676614]
This paper proposes an approach to estimate gender bias in multilingual lexicons from 5 languages: Chinese, English, German, Portuguese, and Spanish. A novel model-based method is presented to generate sentence pairs for a more robust analysis of gender bias. Our results suggest that gender bias should be studied on a large dataset using multiple evaluation metrics for best practice.
arXiv Detail & Related papers (2024-04-09T21:12:08Z)
Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.<n>We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.<n>Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z)
Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text. We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions. Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.