Can Instruction Fine-Tuned Language Models Identify Social Bias through
Prompting?
- URL: http://arxiv.org/abs/2307.10472v1
- Date: Wed, 19 Jul 2023 22:03:40 GMT
- Title: Can Instruction Fine-Tuned Language Models Identify Social Bias through
Prompting?
- Authors: Omkar Dige, Jacob-Junqi Tian, David Emerson, Faiza Khan Khattak
- Abstract summary: We present our work on evaluating instruction fine-tuned language models' ability to identify bias through zero-shot prompting.
Across LLaMA and its two instruction fine-tuned versions, Alpaca 7B performs best on the bias identification task with an accuracy of 56.7%.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As the breadth and depth of language model applications continue to expand
rapidly, it is increasingly important to build efficient frameworks for
measuring and mitigating the learned or inherited social biases of these
models. In this paper, we present our work on evaluating instruction fine-tuned
language models' ability to identify bias through zero-shot prompting,
including Chain-of-Thought (CoT) prompts. Across LLaMA and its two instruction
fine-tuned versions, Alpaca 7B performs best on the bias identification task
with an accuracy of 56.7%. We also demonstrate that scaling up LLM size and
data diversity could lead to further performance gain. This is a
work-in-progress presenting the first component of our bias mitigation
framework. We will keep updating this work as we get more results.
Related papers
- DELIA: Diversity-Enhanced Learning for Instruction Adaptation in Large Language Models [11.77848664657788]
We show that instruction tuning is primarily a process where the model fits to specific task formats, rather than acquiring new knowledge or capabilities.
We propose that this limitation stems from biased features learned during instruction tuning, which differ from ideal task-specfic features.
We use our novel data synthesis method, DELIA, to transform biased features in instruction tuning into approximations of ideal features.
arXiv Detail & Related papers (2024-08-19T17:56:06Z) - BiasDPO: Mitigating Bias in Language Models through Direct Preference Optimization [0.0]
Large Language Models (LLMs) have become pivotal in advancing natural language processing, yet their potential to perpetuate biases poses significant concerns.
This paper introduces a new framework employing Direct Preference Optimization (DPO) to mitigate gender, racial, and religious biases in English text.
By developing a loss function that favors less biased over biased completions, our approach cultivates a preference for respectful and non-discriminatory language.
arXiv Detail & Related papers (2024-07-18T22:32:20Z) - Fine-tuning Language Models for Factuality [96.5203774943198]
Large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines.
Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations'
In this work, we fine-tune language models to be more factual, without human labeling.
arXiv Detail & Related papers (2023-11-14T18:59:15Z) - Roles of Scaling and Instruction Tuning in Language Perception: Model
vs. Human Attention [58.817405319722596]
This work compares the self-attention of several large language models (LLMs) in different sizes to assess the effect of scaling and instruction tuning on language perception.
Results show that scaling enhances the human resemblance and improves the effective attention by reducing the trivial pattern reliance, while instruction tuning does not.
We also find that current LLMs are consistently closer to non-native than native speakers in attention, suggesting a sub-optimal language perception of all models.
arXiv Detail & Related papers (2023-10-29T17:16:40Z) - Soft-prompt Tuning for Large Language Models to Evaluate Bias [0.03141085922386211]
Using soft-prompts to evaluate bias gives us the extra advantage of avoiding the human-bias injection.
We check the model biases on different sensitive attributes using the group fairness (bias) and find interesting bias patterns.
arXiv Detail & Related papers (2023-06-07T19:11:25Z) - Language Model Self-improvement by Reinforcement Learning Contemplation [13.152789365858812]
This paper introduces a novel unsupervised method called LanguageModel Self-Improvement by Reinforcement Learning Contemplation (SIRLC)
As a student, the model generates answers to unlabeled questions, while as a teacher, it evaluates the generated text and assigns scores accordingly.
We demonstrate that SIRLC can be applied to various NLP tasks, such as reasoning problems, text generation, and machine translation.
arXiv Detail & Related papers (2023-05-23T19:25:52Z) - Fairness-guided Few-shot Prompting for Large Language Models [93.05624064699965]
In-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats.
We introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes.
We propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning.
arXiv Detail & Related papers (2023-03-23T12:28:25Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model.
In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z) - Masked Language Modeling and the Distributional Hypothesis: Order Word
Matters Pre-training for Little [74.49773960145681]
A possible explanation for the impressive performance of masked language model (MLM)-training is that such models have learned to represent the syntactic structures prevalent in NLP pipelines.
In this paper, we propose a different explanation: pre-trains succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics.
Our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.
arXiv Detail & Related papers (2021-04-14T06:30:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.