Related papers: Disclosure and Mitigation of Gender Bias in LLMs

Disclosure and Mitigation of Gender Bias in LLMs

URL: http://arxiv.org/abs/2402.11190v1
Date: Sat, 17 Feb 2024 04:48:55 GMT
Title: Disclosure and Mitigation of Gender Bias in LLMs
Authors: Xiangjue Dong, Yibo Wang, Philip S. Yu, James Caverlee
Abstract summary: Large Language Models (LLMs) can generate biased responses. We propose an indirect probing framework based on conditional generation. We explore three distinct strategies to disclose explicit and implicit gender bias in LLMs.
Score: 64.79319733514266
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large Language Models (LLMs) can generate biased responses. Yet previous direct probing techniques contain either gender mentions or predefined gender stereotypes, which are challenging to comprehensively collect. Hence, we propose an indirect probing framework based on conditional generation. This approach aims to induce LLMs to disclose their gender bias even without explicit gender or stereotype mentions. We explore three distinct strategies to disclose explicit and implicit gender bias in LLMs. Our experiments demonstrate that all tested LLMs exhibit explicit and/or implicit gender bias, even when gender stereotypes are not present in the inputs. In addition, an increased model size or model alignment amplifies bias in most cases. Furthermore, we investigate three methods to mitigate bias in LLMs via Hyperparameter Tuning, Instruction Guiding, and Debias Tuning. Remarkably, these methods prove effective even in the absence of explicit genders or stereotypes.

Related papers

The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models [58.130894823145205]
We center transgender, nonbinary, and other gender-diverse identities to investigate how alignment procedures interact with pre-existing gender-diverse bias. Our findings reveal that DPO-aligned models are particularly sensitive to supervised finetuning. We conclude with recommendations tailored to DPO and broader alignment practices.
arXiv Detail & Related papers (2024-11-06T06:50:50Z)
GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models [73.23743278545321]
Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but have also been observed to magnify societal biases. GenderCARE is a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics.
arXiv Detail & Related papers (2024-08-22T15:35:46Z)
GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models. Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs. Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z)
GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large Language Models [20.98831667981121]
Large Language Models (LLMs) are prone to generating content that exhibits gender biases. GenderAlign dataset comprises 8k single-turn dialogues, each paired with a "chosen" and a "rejected" response. Compared to the "rejected" responses, the "chosen" responses demonstrate lower levels of gender bias and higher quality.
arXiv Detail & Related papers (2024-06-20T01:45:44Z)
Hire Me or Not? Examining Language Model's Behavior with Occupation Attributes [7.718858707298602]
Large language models (LLMs) have been widely integrated into production pipelines, like recruitment and recommendation systems. This paper investigates LLMs' behavior with respect to gender stereotypes, in the context of occupation decision making.
arXiv Detail & Related papers (2024-05-06T18:09:32Z)
Probing Explicit and Implicit Gender Bias through LLM Conditional Text Generation [64.79319733514266]
Large Language Models (LLMs) can generate biased and toxic responses. We propose a conditional text generation mechanism without the need for predefined gender phrases and stereotypes.
arXiv Detail & Related papers (2023-11-01T05:31:46Z)
In-Contextual Gender Bias Suppression for Large Language Models [47.246504807946884]
Large Language Models (LLMs) have been reported to encode worrying-levels of gender biases. We propose bias suppression that prevents biased generations of LLMs by providing preambles constructed from manually designed templates. We find that bias suppression has acceptable adverse effect on downstream task performance with HellaSwag and COPA.
arXiv Detail & Related papers (2023-09-13T18:39:08Z)
Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias [33.99768156365231]
We introduce a causal formulation for bias measurement in generative language models. We propose a benchmark called OccuGender, with a bias-measuring procedure to investigate occupational gender bias. The results show that these models exhibit substantial occupational gender bias.
arXiv Detail & Related papers (2022-12-20T22:41:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.