FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning
- URL: http://arxiv.org/abs/2510.04040v1
- Date: Sun, 05 Oct 2025 05:16:54 GMT
- Title: FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning
- Authors: Xu Shen, Song Wang, Zhen Tan, Laura Yao, Xinyu Zhao, Kaidi Xu, Xin Wang, Tianlong Chen,
- Abstract summary: FaithCoT-Bench is a unified benchmark for instance-level CoT unfaithfulness detection.<n>Our framework formulates unfaithfulness detection as a discriminative decision problem.<n>FaithCoT-Bench sets a solid basis for future research toward more interpretable and trustworthy reasoning in LLMs.
- Score: 62.452350134196934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) increasingly rely on Chain-of-Thought (CoT) prompting to improve problem-solving and provide seemingly transparent explanations. However, growing evidence shows that CoT often fail to faithfully represent the underlying reasoning process, raising concerns about their reliability in high-risk applications. Although prior studies have focused on mechanism-level analyses showing that CoTs can be unfaithful, they leave open the practical challenge of deciding whether a specific trajectory is faithful to the internal reasoning of the model. To address this gap, we introduce FaithCoT-Bench, a unified benchmark for instance-level CoT unfaithfulness detection. Our framework establishes a rigorous task formulation that formulates unfaithfulness detection as a discriminative decision problem, and provides FINE-CoT (Faithfulness instance evaluation for Chain-of-Thought), an expert-annotated collection of over 1,000 trajectories generated by four representative LLMs across four domains, including more than 300 unfaithful instances with fine-grained causes and step-level evidence. We further conduct a systematic evaluation of eleven representative detection methods spanning counterfactual, logit-based, and LLM-as-judge paradigms, deriving empirical insights that clarify the strengths and weaknesses of existing approaches and reveal the increased challenges of detection in knowledge-intensive domains and with more advanced models. To the best of our knowledge, FaithCoT-Bench establishes the first comprehensive benchmark for instance-level CoT faithfulness, setting a solid basis for future research toward more interpretable and trustworthy reasoning in LLMs.
Related papers
- Stop Rewarding Hallucinated Steps: Faithfulness-Aware Step-Level Reinforcement Learning for Small Reasoning Models [59.6715047267181]
Small reasoning models (SRMs) are prone to hallucinations, especially in intermediate reasoning steps.<n>Existing mitigation methods based on online reinforcement learning rely on outcome-based rewards or coarse-grained chain-of-thought evaluation.<n>We propose Faithfulness-Aware Step-Level Reinforcement Learning (FaithRL), introducing step-level supervision via explicit faithfulness rewards from a process reward model.
arXiv Detail & Related papers (2026-02-05T17:15:12Z) - Fact-Checking with Large Language Models via Probabilistic Certainty and Consistency [7.806516365113592]
Large language models (LLMs) are increasingly used in applications requiring factual accuracy.<n>While fact-checking can mitigate these errors, existing methods typically retrieve external evidence indiscriminately.<n>We introduce Probabilistic Certainty and Consistency (PCC), a framework that estimates factual confidence.
arXiv Detail & Related papers (2026-01-05T21:57:41Z) - Beware of Reasoning Overconfidence: Pitfalls in the Reasoning Process for Multi-solution Tasks [54.31998314008198]
Large Language Models (LLMs) excel in reasoning tasks requiring a single correct answer, but they perform poorly in multi-solution tasks.<n>We attribute this limitation to textbfreasoning overconfidence: a tendency to express undue certainty in an incomplete solution set.<n>We propose the textbfcognitive-rigidity hypothesis, which posits that overconfidence arises when the reasoning process prematurely converges on a narrow set of thought paths.
arXiv Detail & Related papers (2025-12-01T14:35:06Z) - Red Teaming Large Reasoning Models [26.720095252284818]
Large Reasoning Models (LRMs) have emerged as a powerful advancement in multi-step reasoning tasks.<n>LRMs introduce novel safety and reliability risks, such as CoT-hijacking and prompt-induced inefficiencies.<n>We propose RT-LRM, a unified benchmark designed to assess the trustworthiness of LRMs.
arXiv Detail & Related papers (2025-11-29T09:45:03Z) - Investigating CoT Monitorability in Large Reasoning Models [10.511177985572333]
Large Reasoning Models (LRMs) have demonstrated remarkable performance on complex tasks by engaging in extended reasoning before producing final answers.<n>These detailed reasoning traces also create a new opportunity for AI safety, CoT Monitorability.<n>However, two key fundamental challenges arise when attempting to build more effective monitors through CoT analysis.
arXiv Detail & Related papers (2025-11-11T18:06:34Z) - CoT-Saliency: Unified Chain-of-Thought Reasoning for Heterogeneous Saliency Tasks [96.64597365827046]
We present the first unified framework that jointly handles three operationally heterogeneous saliency tasks.<n>We introduce a Chain-of-Thought (CoT) reasoning process in a Vision-Language Model (VLM) to bridge task heterogeneity.<n>We show our model matches or outperforms specialized SOTA methods and strong closed-source VLMs across all tasks.
arXiv Detail & Related papers (2025-11-01T04:37:01Z) - The Alignment Auditor: A Bayesian Framework for Verifying and Refining LLM Objectives [8.030821324147515]
Inverse Reinforcement Learning can infer reward functions from behaviour.<n>Existing approaches either produce a single, overconfident reward estimate or fail to address the fundamental ambiguity of the task.<n>This paper introduces a principled auditing framework that re-frames reward inference from a simple estimation task to a comprehensive process for verification.
arXiv Detail & Related papers (2025-10-07T16:25:14Z) - ASCoT: An Adaptive Self-Correction Chain-of-Thought Method for Late-Stage Fragility in LLMs [21.409155842171497]
Chain-of-Thought (CoT) prompting has significantly advanced the reasoning capabilities of Large Language Models (LLMs)<n>Errors introduced in the later stages of a CoT chain are significantly more likely to corrupt the final answer than identical errors made at the beginning.<n>We introduce the Adaptive Self-Correction Chain-of-Thought (ASCoT) method to address this specific vulnerability.
arXiv Detail & Related papers (2025-08-07T11:26:40Z) - Deep Hidden Cognition Facilitates Reliable Chain-of-Thought Reasoning [33.30315111732609]
Chain of Thought (CoT) reasoning has demonstrated remarkable deep reasoning capabilities.<n>However, its reliability is often undermined by the accumulation of errors in intermediate steps.<n>This paper introduces an approach to calibrate the CoT reasoning accuracy by leveraging the model's intrinsic veracity encoding.
arXiv Detail & Related papers (2025-07-14T07:41:35Z) - CTRLS: Chain-of-Thought Reasoning via Latent State-Transition [57.51370433303236]
Chain-of-thought (CoT) reasoning enables large language models to break down complex problems into interpretable intermediate steps.<n>We introduce groundingS, a framework that formulates CoT reasoning as a Markov decision process (MDP) with latent state transitions.<n>We show improvements in reasoning accuracy, diversity, and exploration efficiency across benchmark reasoning tasks.
arXiv Detail & Related papers (2025-07-10T21:32:18Z) - ConCISE: Confidence-guided Compression in Step-by-step Efficient Reasoning [64.93140713419561]
Large Reasoning Models (LRMs) perform strongly in complex reasoning tasks via Chain-of-Thought (CoT) prompting, but often suffer from verbose outputs.<n>Existing fine-tuning-based compression methods either operate post-hoc pruning, risking disruption to reasoning coherence, or rely on sampling-based selection.<n>We introduce ConCISE, a framework designed to generate concise reasoning chains, integrating Confidence Injection to boost reasoning confidence, and Early Stopping to terminate reasoning when confidence is sufficient.
arXiv Detail & Related papers (2025-05-08T01:40:40Z) - The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning [56.574829311863446]
Chain-of-Thought (CoT) prompting has been widely recognized for its ability to enhance reasoning capabilities in large language models (LLMs)<n>We demonstrate that CoT and its reasoning variants consistently underperform direct answering across varying model scales and benchmark complexities.<n>Our analysis uncovers a fundamental hybrid mechanism of explicit-implicit reasoning driving CoT's performance in pattern-based ICL.
arXiv Detail & Related papers (2025-04-07T13:51:06Z) - Improving Contextual Faithfulness of Large Language Models via Retrieval Heads-Induced Optimization [35.269343563526675]
We propose RHIO, a framework to teach large language models to explicitly discriminate between faithful and unfaithful generations.<n> RHIO first augments unfaithful samples that simulate realistic model-intrinsic errors by selectively masking retrieval heads.<n>These samples are incorporated into joint training, enabling the model to distinguish unfaithful outputs from faithful ones conditioned on control tokens.
arXiv Detail & Related papers (2025-01-23T11:23:25Z) - Aligning Large Language Models for Faithful Integrity Against Opposing Argument [71.33552795870544]
Large Language Models (LLMs) have demonstrated impressive capabilities in complex reasoning tasks.<n>They can be easily misled by unfaithful arguments during conversations, even when their original statements are correct.<n>We propose a novel framework, named Alignment for Faithful Integrity with Confidence Estimation.
arXiv Detail & Related papers (2025-01-02T16:38:21Z) - Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs [55.66353783572259]
Causal-Consistency Chain-of-Thought harnesses multi-agent collaboration to bolster the faithfulness and causality of foundation models.<n>Our framework demonstrates significant superiority over state-of-the-art methods through extensive and comprehensive evaluations.
arXiv Detail & Related papers (2023-08-23T04:59:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.