Fugu-MT 論文翻訳(概要): Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought

論文の概要: Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought

arxiv url: http://arxiv.org/abs/2403.05518v2
Date: Mon, 26 May 2025 19:19:57 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-28 17:05:57.967891
Title: Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought
Title（参考訳）: Bias-Augmented Consistency Training, Biased Reasoning in Chain-of-Thought
Authors: James Chua, Edward Rees, Hunar Batra, Samuel R. Bowman, Julian Michael, Ethan Perez, Miles Turpin,
Abstract要約: CoT(Chain-of- Thought prompting)は、言語モデル推論の説明可能性を改善する可能性がある。また、CoTはモデルの動きに影響を与える要因を体系的に誤って表すこともできる。まず、GPT-3.5-TurboとLlama-8bモデルに影響を与える9つの異なるバイアスのデータセットを作成します。
参考スコア（独自算出の注目度）: 33.32335629744919
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Chain-of-thought prompting (CoT) has the potential to improve the explainability of language model reasoning. But CoT can also systematically misrepresent the factors influencing models' behavior -- for example, rationalizing answers in line with a user's opinion. We first create a new dataset of 9 different biases that affect GPT-3.5-Turbo and Llama-8b models. These consist of spurious-few-shot patterns, post hoc rationalization, and sycophantic settings. Models switch to the answer implied by the bias, without mentioning the effect of the bias in the CoT. To mitigate this biased reasoning problem, we introduce bias-augmented consistency training (BCT), an unsupervised fine-tuning scheme that trains models to give consistent reasoning across prompts with and without biasing features. We construct a suite testing nine forms of biased reasoning on seven question-answering tasks, and find that applying BCT to GPT-3.5-Turbo with one bias reduces the rate of biased reasoning by 86\% on held-out tasks. Moreover, this model generalizes to other forms of bias, reducing biased reasoning on held-out biases by an average of 37\%. As BCT generalizes to held-out biases and does not require gold labels, this method may hold promise for reducing biased reasoning from as-of-yet unknown biases and on tasks where ground truth reasoning is unavailable.
Abstract（参考訳）: CoT(Chain-of- Thought prompting)は、言語モデル推論の説明可能性を改善する可能性がある。しかし、CoTはモデルの振る舞いに影響を与える要因を体系的に誤解することもできる。まず、GPT-3.5-TurboとLlama-8bモデルに影響を与える9つの異なるバイアスのデータセットを作成します。これらは、ふわふわのファウショットパターン、ホック後の合理化、およびサイコファンティックセッティングで構成されている。モデルは、CoTにおけるバイアスの影響を言及することなく、バイアスによって示唆される回答に切り替える。このバイアス付き推論問題を緩和するために,バイアス付き整合性トレーニング(BCT)を導入する。本研究では,7つの問合せタスクに対して9種類の偏り推論を検証した結果,1つの偏りを持つ GPT-3.5-Turbo に BCT を適用することで,保留タスクにおける偏り推論の率を 86% 削減できることがわかった。さらに、このモデルは、他の形式の偏見に一般化し、保留バイアスに対する偏見推論を平均37 %削減する。 BCTはホールドアウトバイアスに一般化し、金のラベルを必要としないため、この方法は、未発見バイアスからの偏見推論を減らし、根拠となる真理推論が不可能なタスクを負う可能性がある。

関連論文リスト

Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs [51.00909549291524]
大型言語モデル(LLM)は認知バイアスを示す。これらのバイアスはモデルによって異なり、命令チューニングによって増幅することができる。これらのバイアスの違いが事前学習、微調整、あるいはランダムノイズに起因するかどうかは不明だ。
論文参考訳（メタデータ） (2025-07-09T18:01:14Z)
A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models [53.18562650350898]
思考の連鎖(CoT)推論は、大きな言語モデルの性能を高める。大規模視覚言語モデルにおけるCoT忠実度に関する最初の総合的研究について述べる。
論文参考訳（メタデータ） (2025-05-29T18:55:05Z)
CosFairNet:A Parameter-Space based Approach for Bias Free Learning [1.9116784879310025]
バイアス付きデータに基づいてトレーニングされたディープニューラルネットワークは、意図しない推論ルールを不注意に学習することが多い。本稿では,モデルのパラメータ空間内で直接バイアスに対処する新しい手法を提案する。各種合成および実世界のデータセットにおいて,分類精度の向上と偏りの低減効果を示す。
論文参考訳（メタデータ） (2024-10-19T13:06:40Z)
RATIONALYST: Pre-training Process-Supervision for Improving Reasoning [41.9992614617405]
本稿では,事前学習に基づく推論のプロセス・スーパービジョンのモデルであるRATIONALYSTを紹介する。 We extract 79k rationales from web-scale unlabelled dataset (the Pile) and a combination of reasoning datasets with minimal human intervention。 LLaMa-3-8Bの微調整により、RATIONALYSTは7つの代表的な推論ベンチマークで平均3.9%の推論精度を向上させる。
論文参考訳（メタデータ） (2024-10-01T20:05:51Z)
Improving Bias Mitigation through Bias Experts in Natural Language Understanding [10.363406065066538]
補助モデルと主モデルの間に二項分類器を導入するデバイアス化フレームワークを提案する。提案手法は補助モデルのバイアス識別能力を向上させる。
論文参考訳（メタデータ） (2023-12-06T16:15:00Z)
Mitigating Bias for Question Answering Models by Tracking Bias Influence [84.66462028537475]
本稿では,複数選択QAモデルのバイアスを軽減するためのBMBIを提案する。バイアスのある例から学んだ場合、モデルがよりバイアスに傾くように傾くという直感に基づいて、クエリインスタンスのバイアスレベルを測定します。本手法は,複数のバイアスカテゴリにまたがる複数のQA定式化に適用可能であることを示す。
論文参考訳（メタデータ） (2023-10-13T00:49:09Z)
Echoes: Unsupervised Debiasing via Pseudo-bias Labeling in an Echo Chamber [17.034228910493056]
本稿では,既存のバイアスモデルがトレーニングデータにおけるバイアス強調サンプルに過度に適合していることを明らかにする実験的検討を行った。本研究では、バイアスモデルとターゲットモデルを異なる戦略で訓練するEchoesという、単純で効果的な手法を提案する。提案手法は,既存の合成データセットと実世界のデータセットのベースラインと比較して,優れたデバイアス化結果が得られる。
論文参考訳（メタデータ） (2023-05-06T13:13:18Z)
Feature-Level Debiased Natural Language Understanding [86.8751772146264]
既存の自然言語理解(NLU)モデルは、特定のデータセットで高いパフォーマンスを達成するために、データセットバイアスに依存することが多い。本稿では, バイアスの潜在特性を緩和し, バイアスの動的性質を無視するために, DCT(Debiasing contrastive learning)を提案する。 DCTは、ディストリビューション内のパフォーマンスを維持しながら、アウトオブディストリビューションデータセットの最先端のベースラインを上回ります。
論文参考訳（メタデータ） (2022-12-11T06:16:14Z)
Looking at the Overlooked: An Analysis on the Word-Overlap Bias in Natural Language Inference [20.112129592923246]
我々は、NLIモデルにおける重複バイアスの見過ごされた側面、すなわちリバースワードオーバーラップバイアスに焦点を当てる。現在のNLIモデルは、重複の少ないインスタンスにおいて、非エンターメントラベルに対して非常に偏りがある。重なり合うバイアスの出現とその緩和におけるマイノリティ事例の役割について検討する。
論文参考訳（メタデータ） (2022-11-07T21:02:23Z)
Self-supervised debiasing using low rank regularization [59.84695042540525]
純粋な相関は、ディープニューラルネットワークの強いバイアスを引き起こし、一般化能力を損なう可能性がある。ラベルのないサンプルと互換性のある自己監督型脱バイアスフレームワークを提案する。注目すべきは,提案フレームワークが自己教師付き学習ベースラインの一般化性能を著しく向上させることである。
論文参考訳（メタデータ） (2022-10-11T08:26:19Z)
The SAME score: Improved cosine based bias score for word embeddings [49.75878234192369]
埋め込みにおけるセマンティックバイアスのための新しいバイアススコアであるPetを紹介した。本研究は,下水道作業における意味バイアスを測定し,社会的バイアスの潜在的な原因を特定することができることを示す。
論文参考訳（メタデータ） (2022-03-28T09:28:13Z)
General Greedy De-bias Learning [163.65789778416172]
本稿では,関数空間における勾配降下のような偏りのあるモデルとベースモデルを優雅に訓練する一般グリーディ・デバイアス学習フレームワーク(GGD)を提案する。 GGDは、事前知識を持つタスク固有バイアスモデルと、事前知識を持たない自己アンサンブルバイアスモデルの両方の設定の下で、より堅牢なベースモデルを学ぶことができる。
論文参考訳（メタデータ） (2021-12-20T14:47:32Z)
Learning Debiased Models with Dynamic Gradient Alignment and Bias-conflicting Sample Mining [39.00256193731365]
ディープニューラルネットワークは、堅牢性、一般化、公正性をモデル化するのに有害なデータセットバイアスに悩まされている。難解な未知のバイアスと戦うための2段階のデバイアス方式を提案する。
論文参考訳（メタデータ） (2021-11-25T14:50:10Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。