Disclosure and Mitigation of Gender Bias in LLMs
- URL: http://arxiv.org/abs/2402.11190v1
- Date: Sat, 17 Feb 2024 04:48:55 GMT
- Title: Disclosure and Mitigation of Gender Bias in LLMs
- Authors: Xiangjue Dong, Yibo Wang, Philip S. Yu, James Caverlee
- Abstract summary: Large Language Models (LLMs) can generate biased responses.
We propose an indirect probing framework based on conditional generation.
We explore three distinct strategies to disclose explicit and implicit gender bias in LLMs.
- Score: 64.79319733514266
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large Language Models (LLMs) can generate biased responses. Yet previous
direct probing techniques contain either gender mentions or predefined gender
stereotypes, which are challenging to comprehensively collect. Hence, we
propose an indirect probing framework based on conditional generation. This
approach aims to induce LLMs to disclose their gender bias even without
explicit gender or stereotype mentions. We explore three distinct strategies to
disclose explicit and implicit gender bias in LLMs. Our experiments demonstrate
that all tested LLMs exhibit explicit and/or implicit gender bias, even when
gender stereotypes are not present in the inputs. In addition, an increased
model size or model alignment amplifies bias in most cases. Furthermore, we
investigate three methods to mitigate bias in LLMs via Hyperparameter Tuning,
Instruction Guiding, and Debias Tuning. Remarkably, these methods prove
effective even in the absence of explicit genders or stereotypes.
Related papers
- The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models [58.130894823145205]
We center transgender, nonbinary, and other gender-diverse identities to investigate how alignment procedures interact with pre-existing gender-diverse bias.
Our findings reveal that DPO-aligned models are particularly sensitive to supervised finetuning.
We conclude with recommendations tailored to DPO and broader alignment practices.
arXiv Detail & Related papers (2024-11-06T06:50:50Z) - GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models [73.23743278545321]
Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but have also been observed to magnify societal biases.
GenderCARE is a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics.
arXiv Detail & Related papers (2024-08-22T15:35:46Z) - GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models.
Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs.
Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z) - GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large Language Models [20.98831667981121]
Large Language Models (LLMs) are prone to generating content that exhibits gender biases.
GenderAlign dataset comprises 8k single-turn dialogues, each paired with a "chosen" and a "rejected" response.
Compared to the "rejected" responses, the "chosen" responses demonstrate lower levels of gender bias and higher quality.
arXiv Detail & Related papers (2024-06-20T01:45:44Z) - Hire Me or Not? Examining Language Model's Behavior with Occupation Attributes [7.718858707298602]
Large language models (LLMs) have been widely integrated into production pipelines, like recruitment and recommendation systems.
This paper investigates LLMs' behavior with respect to gender stereotypes, in the context of occupation decision making.
arXiv Detail & Related papers (2024-05-06T18:09:32Z) - Probing Explicit and Implicit Gender Bias through LLM Conditional Text
Generation [64.79319733514266]
Large Language Models (LLMs) can generate biased and toxic responses.
We propose a conditional text generation mechanism without the need for predefined gender phrases and stereotypes.
arXiv Detail & Related papers (2023-11-01T05:31:46Z) - In-Contextual Gender Bias Suppression for Large Language Models [47.246504807946884]
Large Language Models (LLMs) have been reported to encode worrying-levels of gender biases.
We propose bias suppression that prevents biased generations of LLMs by providing preambles constructed from manually designed templates.
We find that bias suppression has acceptable adverse effect on downstream task performance with HellaSwag and COPA.
arXiv Detail & Related papers (2023-09-13T18:39:08Z) - Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias [33.99768156365231]
We introduce a causal formulation for bias measurement in generative language models.
We propose a benchmark called OccuGender, with a bias-measuring procedure to investigate occupational gender bias.
The results show that these models exhibit substantial occupational gender bias.
arXiv Detail & Related papers (2022-12-20T22:41:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.