Bias-Augmented Consistency Training Reduces Biased Reasoning in
Chain-of-Thought
- URL: http://arxiv.org/abs/2403.05518v1
- Date: Fri, 8 Mar 2024 18:41:42 GMT
- Title: Bias-Augmented Consistency Training Reduces Biased Reasoning in
Chain-of-Thought
- Authors: James Chua, Edward Rees, Hunar Batra, Samuel R. Bowman, Julian
Michael, Ethan Perez, Miles Turpin
- Abstract summary: Chain-of-thought prompting misrepresents factors influencing models' behavior.
bias-augmented consistency training trains models to give consistent reasoning across prompts with and without biasing features.
Applying BCT to GPT-3.5-Turbo with one bias reduces the rate of biased reasoning by 86% on held-out tasks.
- Score: 34.99438001331234
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While chain-of-thought prompting (CoT) has the potential to improve the
explainability of language model reasoning, it can systematically misrepresent
the factors influencing models' behavior--for example, rationalizing answers in
line with a user's opinion without mentioning this bias. To mitigate this
biased reasoning problem, we introduce bias-augmented consistency training
(BCT), an unsupervised fine-tuning scheme that trains models to give consistent
reasoning across prompts with and without biasing features. We construct a
suite testing nine forms of biased reasoning on seven question-answering tasks,
and find that applying BCT to GPT-3.5-Turbo with one bias reduces the rate of
biased reasoning by 86% on held-out tasks. Moreover, this model generalizes to
other forms of bias, reducing biased reasoning on held-out biases by an average
of 37%. As BCT generalizes to held-out biases and does not require gold labels,
this method may hold promise for reducing biased reasoning from as-of-yet
unknown biases and on tasks where supervision for ground truth reasoning is
unavailable.
Related papers
- Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs [51.00909549291524]
Large language models (LLMs) exhibit cognitive biases.<n>These biases vary across models and can be amplified by instruction tuning.<n>It remains unclear if these differences in biases stem from pretraining, finetuning, or even random noise.
arXiv Detail & Related papers (2025-07-09T18:01:14Z) - A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models [53.18562650350898]
Chain-of-thought (CoT) reasoning enhances performance of large language models.<n>We present the first comprehensive study of CoT faithfulness in large vision-language models.
arXiv Detail & Related papers (2025-05-29T18:55:05Z) - CosFairNet:A Parameter-Space based Approach for Bias Free Learning [1.9116784879310025]
Deep neural networks trained on biased data often inadvertently learn unintended inference rules.
We introduce a novel approach to address bias directly in the model's parameter space, preventing its propagation across layers.
We show enhanced classification accuracy and debiasing effectiveness across various synthetic and real-world datasets.
arXiv Detail & Related papers (2024-10-19T13:06:40Z) - RATIONALYST: Pre-training Process-Supervision for Improving Reasoning [41.9992614617405]
We introduce RATIONALYST, a model for process-supervision of reasoning based on pre-training.
We extract 79k rationales from web-scale unlabelled dataset (the Pile) and a combination of reasoning datasets with minimal human intervention.
Fine-tuned from LLaMa-3-8B, RATIONALYST improves the accuracy of reasoning by an average of 3.9% on 7 representative reasoning benchmarks.
arXiv Detail & Related papers (2024-10-01T20:05:51Z) - Improving Bias Mitigation through Bias Experts in Natural Language
Understanding [10.363406065066538]
We propose a new debiasing framework that introduces binary classifiers between the auxiliary model and the main model.
Our proposed strategy improves the bias identification ability of the auxiliary model.
arXiv Detail & Related papers (2023-12-06T16:15:00Z) - Mitigating Bias for Question Answering Models by Tracking Bias Influence [84.66462028537475]
We propose BMBI, an approach to mitigate the bias of multiple-choice QA models.
Based on the intuition that a model would lean to be more biased if it learns from a biased example, we measure the bias level of a query instance.
We show that our method could be applied to multiple QA formulations across multiple bias categories.
arXiv Detail & Related papers (2023-10-13T00:49:09Z) - Echoes: Unsupervised Debiasing via Pseudo-bias Labeling in an Echo
Chamber [17.034228910493056]
This paper presents experimental analyses revealing that the existing biased models overfit to bias-conflicting samples in the training data.
We propose a straightforward and effective method called Echoes, which trains a biased model and a target model with a different strategy.
Our approach achieves superior debiasing results compared to the existing baselines on both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-05-06T13:13:18Z) - Feature-Level Debiased Natural Language Understanding [86.8751772146264]
Existing natural language understanding (NLU) models often rely on dataset biases to achieve high performance on specific datasets.
We propose debiasing contrastive learning (DCT) to mitigate biased latent features and neglect the dynamic nature of bias.
DCT outperforms state-of-the-art baselines on out-of-distribution datasets while maintaining in-distribution performance.
arXiv Detail & Related papers (2022-12-11T06:16:14Z) - Looking at the Overlooked: An Analysis on the Word-Overlap Bias in
Natural Language Inference [20.112129592923246]
We focus on an overlooked aspect of the overlap bias in NLI models: the reverse word-overlap bias.
Current NLI models are highly biased towards the non-entailment label on instances with low overlap.
We investigate the reasons for the emergence of the overlap bias and the role of minority examples in its mitigation.
arXiv Detail & Related papers (2022-11-07T21:02:23Z) - Self-supervised debiasing using low rank regularization [59.84695042540525]
Spurious correlations can cause strong biases in deep neural networks, impairing generalization ability.
We propose a self-supervised debiasing framework potentially compatible with unlabeled samples.
Remarkably, the proposed debiasing framework significantly improves the generalization performance of self-supervised learning baselines.
arXiv Detail & Related papers (2022-10-11T08:26:19Z) - The SAME score: Improved cosine based bias score for word embeddings [49.75878234192369]
We introduce SAME, a novel bias score for semantic bias in embeddings.
We show that SAME is capable of measuring semantic bias and identify potential causes for social bias in downstream tasks.
arXiv Detail & Related papers (2022-03-28T09:28:13Z) - General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z) - Learning Debiased Models with Dynamic Gradient Alignment and
Bias-conflicting Sample Mining [39.00256193731365]
Deep neural networks notoriously suffer from dataset biases which are detrimental to model robustness, generalization and fairness.
We propose a two-stage debiasing scheme to combat against the intractable unknown biases.
arXiv Detail & Related papers (2021-11-25T14:50:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.