Fugu-MT 論文翻訳(概要): Mitigating Spurious Correlations Between Question and Answer via Chain-of-Thought Correctness Perception Distillation

論文の概要: Mitigating Spurious Correlations Between Question and Answer via Chain-of-Thought Correctness Perception Distillation

arxiv url: http://arxiv.org/abs/2509.05602v2
Date: Tue, 09 Sep 2025 07:15:51 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-10 12:33:22.823675
Title: Mitigating Spurious Correlations Between Question and Answer via Chain-of-Thought Correctness Perception Distillation
Title（参考訳）: Chain-of-Thought Correctness Perception Distillation による質問と回答の相互関係の緩和
Authors: Hongyan Xie, Yitong Yao, Yikun Ban, Zixuan Huang, Deqing Wang, Zhenhe Wu, Haoxiang Su, Chao Wang, Shuangyong Song,
Abstract要約: CoPeD (Chain-of-Thought Correctness Perception Distillation) は,学生モデルの推論品質の向上を目的としている。 CoPeDは学生モデルに対して、正しい合理性に基づいて回答を予測し、誤ったときに修正するよう推奨する。
参考スコア（独自算出の注目度）: 25.195244084313114
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) excel at reasoning tasks but are expensive to deploy. Thus small language models (SLMs) are fine-tuned on CoT data generated by LLMs to copy LLMs' abilities. However, these CoT data may include noisy rationales that either fail to substantiate the answers or contribute no additional information to support answer prediction, which leads SLMs to capture spurious correlations between questions and answers and compromise the quality of reasoning. In this work, we propose Chain-of-Thought Correctness Perception Distillation (CoPeD), which aims to improve the reasoning quality of the student model from the perspectives of task setting and data utilization. Firstly, we introduce a correctness-aware task setting that encourages the student model to predict answers based on correct rationales and revise them when they are incorrect. This setting improves the faithfulness of reasoning and allows the model to learn from its mistakes. Then, we propose a Correctness-Aware Weighted loss, which dynamically adjusts the contribution of each training instance based on the combined loss of the rationale and the answer. This strategy encourages the model to focus more on samples where the rationale offers stronger support for the correct answer. Experiments have shown that CoPeD is effective on both in-distribution (IND) and out-of-distribution (OOD) benchmark reasoning datasets.
Abstract（参考訳）: 大規模言語モデル(LLM)は推論タスクに優れるが、デプロイには高価である。従って、小さな言語モデル(SLM)は、LLMが生成したCoTデータに基づいて微調整され、LLMの能力を模倣する。しかし、これらのCoTデータには、答えを裏付けることに失敗したり、答えの予測を支援するために追加情報を提供しなかったりするノイズのある合理性が含まれており、SLMは疑問と答えの間の急激な相関を捉え、推論の質を損なうことになる。本研究では,課題設定とデータ利用の観点から,学生モデルの推論品質を向上させることを目的としたCoPeD(Chain-of-Thought Correctness Perception Distillation)を提案する。まず,学生モデルに対して,正しい合理性に基づいて回答を予測し,正しい場合の修正を促す,正当性を考慮したタスク設定を提案する。この設定は推論の忠実さを改善し、モデルがそのミスから学ぶことを可能にする。そこで我々は,各トレーニングインスタンスの寄与度を,合理性と解答の複合的損失に基づいて動的に調整する,正当性を考慮した重み付き損失を提案する。この戦略は、モデルが正しい回答に対してより強力なサポートを提供するようなサンプルにもっと焦点を合わせることを奨励します。実験の結果、CoPeDはIn-distriion(IND)とout-of-distriion(OOD)ベンチマーク推論データセットの両方に有効であることがわかった。

論文の概要: Mitigating Spurious Correlations Between Question and Answer via Chain-of-Thought Correctness Perception Distillation

関連論文リスト