Fugu-MT 論文翻訳(概要): FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning

論文の概要: FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning

arxiv url: http://arxiv.org/abs/2510.04040v1
Date: Sun, 05 Oct 2025 05:16:54 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-07 16:52:59.416016
Title: FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning
Title（参考訳）: FaithCoT-Bench:Chain-of-Thought ReasoningのインスタンスレベルのFithfulnessのベンチマーク
Authors: Xu Shen, Song Wang, Zhen Tan, Laura Yao, Xinyu Zhao, Kaidi Xu, Xin Wang, Tianlong Chen,
Abstract要約: FaithCoT-Benchは、インスタンスレベルのCoT不信検出のための統一ベンチマークである。我々の枠組みは差別的な決定問題として不誠実検出を定式化している。 FaithCoT-Bench は LLM のより解釈可能で信頼性の高い推論に向けた将来の研究の基盤となる。
参考スコア（独自算出の注目度）: 62.452350134196934
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) increasingly rely on Chain-of-Thought (CoT) prompting to improve problem-solving and provide seemingly transparent explanations. However, growing evidence shows that CoT often fail to faithfully represent the underlying reasoning process, raising concerns about their reliability in high-risk applications. Although prior studies have focused on mechanism-level analyses showing that CoTs can be unfaithful, they leave open the practical challenge of deciding whether a specific trajectory is faithful to the internal reasoning of the model. To address this gap, we introduce FaithCoT-Bench, a unified benchmark for instance-level CoT unfaithfulness detection. Our framework establishes a rigorous task formulation that formulates unfaithfulness detection as a discriminative decision problem, and provides FINE-CoT (Faithfulness instance evaluation for Chain-of-Thought), an expert-annotated collection of over 1,000 trajectories generated by four representative LLMs across four domains, including more than 300 unfaithful instances with fine-grained causes and step-level evidence. We further conduct a systematic evaluation of eleven representative detection methods spanning counterfactual, logit-based, and LLM-as-judge paradigms, deriving empirical insights that clarify the strengths and weaknesses of existing approaches and reveal the increased challenges of detection in knowledge-intensive domains and with more advanced models. To the best of our knowledge, FaithCoT-Bench establishes the first comprehensive benchmark for instance-level CoT faithfulness, setting a solid basis for future research toward more interpretable and trustworthy reasoning in LLMs.
Abstract（参考訳）: 大規模言語モデル(LLM)は、問題解決の改善と一見透明な説明を提供するために、ますますCoT(Chain-of-Thought)に依存している。しかし、成長する証拠は、CoTが基盤となる推論プロセスの忠実な表現に失敗し、リスクの高いアプリケーションにおける信頼性に関する懸念を提起することが多いことを示している。これまでの研究では、CoTが不信であることを示すメカニズムレベルの分析に焦点が当てられていたが、特定の軌道がモデルの内部的推論に忠実であるかどうかを判断する実践的な課題は残されている。このギャップに対処するために、インスタンスレベルのCoT不完全性検出のための統一ベンチマークであるFaithCoT-Benchを紹介する。本フレームワークは,不信の検出を識別的決定問題として定式化し,FINE-CoT (Faithfulness instance evaluation for Chain-of-Thought) を提供する。さらに, 対物的, ロジット的, LLM-as-judgeパラダイムにまたがる11の代表的な検出手法を体系的に評価し, 既存のアプローチの長所と短所を明らかにする経験的洞察を導き, 知識集約ドメインとより高度なモデルによる検出の課題の増大を明らかにする。我々の知る限り、FaithCoT-Benchは、インスタンスレベルのCoT忠実性に関する最初の包括的なベンチマークを確立し、LLMのより解釈可能で信頼性の高い推論に向けた、将来の研究の基盤となる。

論文の概要: FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning

関連論文リスト