Generalizing Fairness to Generative Language Models via Reformulation of Non-discrimination Criteria
- URL: http://arxiv.org/abs/2403.08564v3
- Date: Mon, 2 Sep 2024 11:09:55 GMT
- Title: Generalizing Fairness to Generative Language Models via Reformulation of Non-discrimination Criteria
- Authors: Sara Sterlie, Nina Weng, Aasa Feragen,
- Abstract summary: This paper studies how to uncover and quantify the presence of gender biases in generative language models.
We derive generative AI analogues of three well-known non-discrimination criteria from classification, namely independence, separation and sufficiency.
Our results address the presence of occupational gender bias within such conversational language models.
- Score: 4.738231680800414
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative AI, such as large language models, has undergone rapid development within recent years. As these models become increasingly available to the public, concerns arise about perpetuating and amplifying harmful biases in applications. Gender stereotypes can be harmful and limiting for the individuals they target, whether they consist of misrepresentation or discrimination. Recognizing gender bias as a pervasive societal construct, this paper studies how to uncover and quantify the presence of gender biases in generative language models. In particular, we derive generative AI analogues of three well-known non-discrimination criteria from classification, namely independence, separation and sufficiency. To demonstrate these criteria in action, we design prompts for each of the criteria with a focus on occupational gender stereotype, specifically utilizing the medical test to introduce the ground truth in the generative AI context. Our results address the presence of occupational gender bias within such conversational language models.
Related papers
- The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models [58.130894823145205]
We center transgender, nonbinary, and other gender-diverse identities to investigate how alignment procedures interact with pre-existing gender-diverse bias.
Our findings reveal that DPO-aligned models are particularly sensitive to supervised finetuning.
We conclude with recommendations tailored to DPO and broader alignment practices.
arXiv Detail & Related papers (2024-11-06T06:50:50Z) - GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models [73.23743278545321]
Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but have also been observed to magnify societal biases.
GenderCARE is a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics.
arXiv Detail & Related papers (2024-08-22T15:35:46Z) - Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders.
This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words)
We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z) - Hire Me or Not? Examining Language Model's Behavior with Occupation Attributes [7.718858707298602]
Large language models (LLMs) have been widely integrated into production pipelines, like recruitment and recommendation systems.
This paper investigates LLMs' behavior with respect to gender stereotypes, in the context of occupation decision making.
arXiv Detail & Related papers (2024-05-06T18:09:32Z) - Evaluating Large Language Models through Gender and Racial Stereotypes [0.0]
We conduct a quality comparative study and establish a framework to evaluate language models under the premise of two kinds of biases: gender and race.
We find out that while gender bias has reduced immensely in newer models, as compared to older ones, racial bias still exists.
arXiv Detail & Related papers (2023-11-24T18:41:16Z) - Probing Explicit and Implicit Gender Bias through LLM Conditional Text
Generation [64.79319733514266]
Large Language Models (LLMs) can generate biased and toxic responses.
We propose a conditional text generation mechanism without the need for predefined gender phrases and stereotypes.
arXiv Detail & Related papers (2023-11-01T05:31:46Z) - Unveiling Gender Bias in Terms of Profession Across LLMs: Analyzing and
Addressing Sociological Implications [0.0]
The study examines existing research on gender bias in AI language models and identifies gaps in the current knowledge.
The findings shed light on gendered word associations, language usage, and biased narratives present in the outputs of Large Language Models.
The paper presents strategies for reducing gender bias in LLMs, including algorithmic approaches and data augmentation techniques.
arXiv Detail & Related papers (2023-07-18T11:38:45Z) - Efficient Gender Debiasing of Pre-trained Indic Language Models [0.0]
The gender bias present in the data on which language models are pre-trained gets reflected in the systems that use these models.
In our paper, we measure gender bias associated with occupations in Hindi language models.
Our results reflect that the bias is reduced post-introduction of our proposed mitigation techniques.
arXiv Detail & Related papers (2022-09-08T09:15:58Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z) - Evaluating Gender Bias in Natural Language Inference [5.034017602990175]
We propose an evaluation methodology to measure gender bias in natural language understanding through inference.
We use our challenge task to investigate state-of-the-art NLI models on the presence of gender stereotypes using occupations.
Our findings suggest that three models trained on MNLI and SNLI datasets are significantly prone to gender-induced prediction errors.
arXiv Detail & Related papers (2021-05-12T09:41:51Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.