Disclosure and Mitigation of Gender Bias in LLMs
- URL: http://arxiv.org/abs/2402.11190v1
- Date: Sat, 17 Feb 2024 04:48:55 GMT
- Title: Disclosure and Mitigation of Gender Bias in LLMs
- Authors: Xiangjue Dong, Yibo Wang, Philip S. Yu, James Caverlee
- Abstract summary: Large Language Models (LLMs) can generate biased responses.
We propose an indirect probing framework based on conditional generation.
We explore three distinct strategies to disclose explicit and implicit gender bias in LLMs.
- Score: 64.79319733514266
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large Language Models (LLMs) can generate biased responses. Yet previous
direct probing techniques contain either gender mentions or predefined gender
stereotypes, which are challenging to comprehensively collect. Hence, we
propose an indirect probing framework based on conditional generation. This
approach aims to induce LLMs to disclose their gender bias even without
explicit gender or stereotype mentions. We explore three distinct strategies to
disclose explicit and implicit gender bias in LLMs. Our experiments demonstrate
that all tested LLMs exhibit explicit and/or implicit gender bias, even when
gender stereotypes are not present in the inputs. In addition, an increased
model size or model alignment amplifies bias in most cases. Furthermore, we
investigate three methods to mitigate bias in LLMs via Hyperparameter Tuning,
Instruction Guiding, and Debias Tuning. Remarkably, these methods prove
effective even in the absence of explicit genders or stereotypes.
Related papers
- GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models.
Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs.
Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z) - GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large Language Models [20.98831667981121]
Large Language Models (LLMs) are prone to generating content that exhibits gender biases.
GenderAlign dataset comprises 8k single-turn dialogues, each paired with a "chosen" and a "rejected" response.
Compared to the "rejected" responses, the "chosen" responses demonstrate lower levels of gender bias and higher quality.
arXiv Detail & Related papers (2024-06-20T01:45:44Z) - Hire Me or Not? Examining Language Model's Behavior with Occupation Attributes [7.718858707298602]
Large language models (LLMs) have been widely integrated into production pipelines, like recruitment and recommendation systems.
This paper investigates LLMs' behavior with respect to gender stereotypes, in the context of occupation decision making.
arXiv Detail & Related papers (2024-05-06T18:09:32Z) - Self-Debiasing Large Language Models: Zero-Shot Recognition and
Reduction of Stereotypes [73.12947922129261]
We leverage the zero-shot capabilities of large language models to reduce stereotyping.
We show that self-debiasing can significantly reduce the degree of stereotyping across nine different social groups.
We hope this work opens inquiry into other zero-shot techniques for bias mitigation.
arXiv Detail & Related papers (2024-02-03T01:40:11Z) - Probing Explicit and Implicit Gender Bias through LLM Conditional Text
Generation [64.79319733514266]
Large Language Models (LLMs) can generate biased and toxic responses.
We propose a conditional text generation mechanism without the need for predefined gender phrases and stereotypes.
arXiv Detail & Related papers (2023-11-01T05:31:46Z) - In-Contextual Gender Bias Suppression for Large Language Models [47.246504807946884]
Large Language Models (LLMs) have been reported to encode worrying-levels of gender biases.
We propose bias suppression that prevents biased generations of LLMs by providing preambles constructed from manually designed templates.
We find that bias suppression has acceptable adverse effect on downstream task performance with HellaSwag and COPA.
arXiv Detail & Related papers (2023-09-13T18:39:08Z) - Gender bias and stereotypes in Large Language Models [0.6882042556551611]
This paper investigates Large Language Models' behavior with respect to gender stereotypes.
We use a simple paradigm to test the presence of gender bias, building on but differing from WinoBias.
Our contributions in this paper are as follows: (a) LLMs are 3-6 times more likely to choose an occupation that stereotypically aligns with a person's gender; (b) these choices align with people's perceptions better than with the ground truth as reflected in official job statistics; (d) LLMs ignore crucial ambiguities in sentence structure 95% of the time in our study items, but when explicitly prompted, they recognize
arXiv Detail & Related papers (2023-08-28T22:32:05Z) - Testing Occupational Gender Bias in Language Models: Towards Robust Measurement and Zero-Shot Debiasing [98.07536837448293]
Large language models (LLMs) have been shown to exhibit a variety of harmful, human-like biases against various demographics.
We introduce a list of desiderata for robustly measuring biases in generative language models.
We then use this benchmark to test several state-of-the-art open-source LLMs, including Llama, Mistral, and their instruction-tuned versions.
arXiv Detail & Related papers (2022-12-20T22:41:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.