LIDAO: Towards Limited Interventions for Debiasing (Large) Language Models
- URL: http://arxiv.org/abs/2406.00548v1
- Date: Sat, 1 Jun 2024 20:12:54 GMT
- Title: LIDAO: Towards Limited Interventions for Debiasing (Large) Language Models
- Authors: Tianci Liu, Haoyu Wang, Shiyang Wang, Yu Cheng, Jing Gao,
- Abstract summary: Large language models (LLMs) have achieved impressive performance on various natural language generation tasks.
However, they suffer from generating negative and harmful contents that are biased against certain demographic groups.
We propose LIDAO, a framework to debias a (L)LM at a better fluency provably.
- Score: 19.18522268167047
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have achieved impressive performance on various natural language generation tasks. Nonetheless, they suffer from generating negative and harmful contents that are biased against certain demographic groups (e.g., female), raising severe fairness concerns. As remedies, prior works intervened the generation by removing attitude or demographic information, inevitably degrading the generation quality and resulting in notable \textit{fairness-fluency} trade-offs. However, it is still under-explored to what extent the fluency \textit{has to} be affected in order to achieve a desired level of fairness. In this work, we conduct the first formal study from an information-theoretic perspective. We show that previous approaches are excessive for debiasing and propose LIDAO, a general framework to debias a (L)LM at a better fluency provably. We further robustify LIDAO in adversarial scenarios, where a carefully-crafted prompt may stimulate LLMs exhibiting instruction-following abilities to generate texts with fairness issue appears only when the prompt is also taken into account. Experiments on three LMs ranging from 0.7B to 7B parameters demonstrate the superiority of our method.
Related papers
- Understanding Privacy Risks of Embeddings Induced by Large Language Models [75.96257812857554]
Large language models show early signs of artificial general intelligence but struggle with hallucinations.
One promising solution is to store external knowledge as embeddings, aiding LLMs in retrieval-augmented generation.
Recent studies experimentally showed that the original text can be partially reconstructed from text embeddings by pre-trained language models.
arXiv Detail & Related papers (2024-04-25T13:10:48Z) - Debiasing Multimodal Large Language Models [61.6896704217147]
Large Vision-Language Models (LVLMs) have become indispensable tools in computer vision and natural language processing.
Our investigation reveals a noteworthy bias in the generated content, where the output is primarily influenced by the underlying Large Language Models (LLMs) prior to the input image.
To rectify these biases and redirect the model's focus toward vision information, we introduce two simple, training-free strategies.
arXiv Detail & Related papers (2024-03-08T12:35:07Z) - ROBBIE: Robust Bias Evaluation of Large Generative Language Models [27.864027322486375]
Different prompt-based datasets can be used to measure social bias across multiple text domains and demographic axes.
We compare 6 different prompt-based bias and toxicity metrics across 12 demographic axes and 5 families of generative LLMs.
We conduct a comprehensive study of how well 3 bias/toxicity mitigation techniques perform across our suite of measurements.
arXiv Detail & Related papers (2023-11-29T23:03:04Z) - Exploring the Jungle of Bias: Political Bias Attribution in Language Models via Dependency Analysis [86.49858739347412]
Large Language Models (LLMs) have sparked intense debate regarding the prevalence of bias in these models and its mitigation.
We propose a prompt-based method for the extraction of confounding and mediating attributes which contribute to the decision process.
We find that the observed disparate treatment can at least in part be attributed to confounding and mitigating attributes and model misalignment.
arXiv Detail & Related papers (2023-11-15T00:02:25Z) - Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools.
Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions.
Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z) - In-Contextual Gender Bias Suppression for Large Language Models [47.246504807946884]
Large Language Models (LLMs) have been reported to encode worrying-levels of gender biases.
We propose bias suppression that prevents biased generations of LLMs by providing preambles constructed from manually designed templates.
We find that bias suppression has acceptable adverse effect on downstream task performance with HellaSwag and COPA.
arXiv Detail & Related papers (2023-09-13T18:39:08Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Toward Fairness in Text Generation via Mutual Information Minimization
based on Importance Sampling [23.317845744611375]
We propose to minimize the mutual information between the semantics in the generated text sentences and their demographic polarity.
In this way, the mentioning of a demographic group is encouraged to be independent from how it is described in the generated text.
We also propose a distillation mechanism that preserves the language modeling ability of the PLMs after debiasing.
arXiv Detail & Related papers (2023-02-25T18:29:02Z) - Perturbation Augmentation for Fairer NLP [33.442601687940204]
Language models pre-trained on demographically perturbed corpora are more fair, at least, according to our best metrics for measuring model fairness.
Although our findings appear promising, there are still some limitations, as well as outstanding questions about how best to evaluate the (un)fairness of large language models.
arXiv Detail & Related papers (2022-05-25T09:00:29Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.