Related papers: Locating and Mitigating Gender Bias in Large Language Models

Locating and Mitigating Gender Bias in Large Language Models

URL: http://arxiv.org/abs/2403.14409v1
Date: Thu, 21 Mar 2024 13:57:43 GMT
Title: Locating and Mitigating Gender Bias in Large Language Models
Authors: Yuchen Cai, Ding Cao, Rongxi Guo, Yaqin Wen, Guiquan Liu, Enhong Chen,
Abstract summary: Large language models (LLM) are pre-trained on extensive corpora to learn facts and human cognition which contain human preferences. This process can inadvertently lead to these models acquiring biases and prevalent stereotypes in society. We propose the LSDM (Least Square Debias Method), a knowledge-editing based method for mitigating gender bias in occupational pronouns.
Score: 40.78150878350479
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models(LLM) are pre-trained on extensive corpora to learn facts and human cognition which contain human preferences. However, this process can inadvertently lead to these models acquiring biases and stereotypes prevalent in society. Prior research has typically tackled the issue of bias through a one-dimensional perspective, concentrating either on locating or mitigating it. This limited perspective has created obstacles in facilitating research on bias to synergistically complement and progressively build upon one another. In this study, we integrate the processes of locating and mitigating bias within a unified framework. Initially, we use causal mediation analysis to trace the causal effects of different components' activation within a large language model. Building on this, we propose the LSDM (Least Square Debias Method), a knowledge-editing based method for mitigating gender bias in occupational pronouns, and compare it against two baselines on three gender bias datasets and seven knowledge competency test datasets. The experimental results indicate that the primary contributors to gender bias are the bottom MLP modules acting on the last token of occupational pronouns and the top attention module acting on the final word in the sentence. Furthermore, LSDM mitigates gender bias in the model more effectively than the other baselines, while fully preserving the model's capabilities in all other aspects.

Related papers

Exploring Gender Bias in Large Language Models: An In-depth Dive into the German Language [21.87606488958834]
We present five German datasets for gender bias evaluation in large language models (LLMs)<n>The datasets are grounded in well-established concepts of gender bias and are accessible through multiple methodologies.<n>Our findings, reported for eight multilingual LLM models, reveal unique challenges associated with gender bias in German.
arXiv Detail & Related papers (2025-07-22T13:09:41Z)
Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models [49.41113560646115]
We investigate various proxy measures of bias in large language models (LLMs)<n>We find that evaluating models with pre-prompted personae on a multi-subject benchmark (MMLU) leads to negligible and mostly random differences in scores.<n>With the recent trend for LLM assistant memory and personalization, these problems open up from a different angle.
arXiv Detail & Related papers (2025-06-12T08:47:40Z)
Blind Men and the Elephant: Diverse Perspectives on Gender Stereotypes in Benchmark Datasets [17.101242741559428]
This paper focuses on intrinsic bias mitigation and measurement strategies for language models. We delve deeper into intrinsic measurements, identifying inconsistencies and suggesting that these benchmarks may reflect different facets of gender stereotype. Our findings underscore the complexity of gender stereotyping in language models and point to new directions for developing more refined techniques to detect and reduce bias.
arXiv Detail & Related papers (2025-01-02T09:40:31Z)
How far can bias go? -- Tracing bias from pretraining data to alignment [54.51310112013655]
This study examines the correlation between gender-occupation bias in pre-training data and their manifestation in LLMs. Our findings reveal that biases present in pre-training data are amplified in model outputs.
arXiv Detail & Related papers (2024-11-28T16:20:25Z)
The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models [58.130894823145205]
We center transgender, nonbinary, and other gender-diverse identities to investigate how alignment procedures interact with pre-existing gender-diverse bias. Our findings reveal that DPO-aligned models are particularly sensitive to supervised finetuning. We conclude with recommendations tailored to DPO and broader alignment practices.
arXiv Detail & Related papers (2024-11-06T06:50:50Z)
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs) [82.57490175399693]
We study gender bias in 22 popular image-to-text vision-language assistants (VLAs) Our results show that VLAs replicate human biases likely present in the data, such as real-world occupational imbalances. To eliminate the gender bias in these models, we find that finetuning-based debiasing methods achieve the best tradeoff between debiasing and retaining performance on downstream tasks.
arXiv Detail & Related papers (2024-10-25T05:59:44Z)
GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models. Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs. Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z)
Hire Me or Not? Examining Language Model's Behavior with Occupation Attributes [7.718858707298602]
Large language models (LLMs) have been widely integrated into production pipelines, like recruitment and recommendation systems. This paper investigates LLMs' behavior with respect to gender stereotypes, in the context of occupation decision making.
arXiv Detail & Related papers (2024-05-06T18:09:32Z)
The Impact of Debiasing on the Performance of Language Models in Downstream Tasks is Underestimated [70.23064111640132]
We compare the impact of debiasing on performance across multiple downstream tasks using a wide-range of benchmark datasets. Experiments show that the effects of debiasing are consistently emphunderestimated across all tasks.
arXiv Detail & Related papers (2023-09-16T20:25:34Z)
Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias [33.99768156365231]
We introduce a causal formulation for bias measurement in generative language models. We propose a benchmark called OccuGender, with a bias-measuring procedure to investigate occupational gender bias. The results show that these models exhibit substantial occupational gender bias.
arXiv Detail & Related papers (2022-12-20T22:41:24Z)
The Birth of Bias: A case study on the evolution of gender bias in an English language model [1.6344851071810076]
We use a relatively small language model, using the LSTM architecture trained on an English Wikipedia corpus. We find that the representation of gender is dynamic and identify different phases during training. We show that gender information is represented increasingly locally in the input embeddings of the model.
arXiv Detail & Related papers (2022-07-21T00:59:04Z)
Word Embeddings via Causal Inference: Gender Bias Reducing and Semantic Information Preserving [3.114945725130788]
We propose a novel methodology that leverages a causal inference framework to effectively remove gender bias. Our comprehensive experiments show that the proposed method achieves state-of-the-art results in gender-debiasing tasks.
arXiv Detail & Related papers (2021-12-09T19:57:22Z)
Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race. Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables. This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z)
Worst of Both Worlds: Biases Compound in Pre-trained Vision-and-Language Models [17.90351661475405]
This work extends text-based bias analysis methods to investigate multimodal language models. We demonstrate that VL-BERT exhibits gender biases, often preferring to reinforce a stereotype over faithfully describing the visual scene.
arXiv Detail & Related papers (2021-04-18T00:02:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.