Related papers: Critical or Compliant? The Double-Edged Sword of Reasoning in Chain-of-Thought Explanations

Critical or Compliant? The Double-Edged Sword of Reasoning in Chain-of-Thought Explanations

URL: http://arxiv.org/abs/2511.12001v2
Date: Wed, 19 Nov 2025 05:49:39 GMT
Title: Critical or Compliant? The Double-Edged Sword of Reasoning in Chain-of-Thought Explanations
Authors: Eunkyu Park, Wesley Hanwen Deng, Vasudha Varadarajan, Mingxi Yan, Gunhee Kim, Maarten Sap, Motahhare Eslami,
Abstract summary: We study the role of Chain-of-Thought (CoT) explanations in moral scenarios by systematically perturbing reasoning chains and manipulating delivery tones.<n>Our findings reveal two key effects: (1) users often trust with outcome agreement, sustaining reliance even when reasoning is flawed.<n>These results highlight how CoT explanations can simultaneously clarify and mislead, underscoring the need for NLP systems to provide explanations that encourage scrutiny and critical thinking rather than blind trust.
Score: 60.27156500679296
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Explanations are often promoted as tools for transparency, but they can also foster confirmation bias; users may assume reasoning is correct whenever outputs appear acceptable. We study this double-edged role of Chain-of-Thought (CoT) explanations in multimodal moral scenarios by systematically perturbing reasoning chains and manipulating delivery tones. Specifically, we analyze reasoning errors in vision language models (VLMs) and how they impact user trust and the ability to detect errors. Our findings reveal two key effects: (1) users often equate trust with outcome agreement, sustaining reliance even when reasoning is flawed, and (2) the confident tone suppresses error detection while maintaining reliance, showing that delivery styles can override correctness. These results highlight how CoT explanations can simultaneously clarify and mislead, underscoring the need for NLP systems to provide explanations that encourage scrutiny and critical thinking rather than blind trust. All code will be released publicly.

Related papers

Balancing Faithfulness and Performance in Reasoning via Multi-Listener Soft Execution [79.98699884805636]
Reasoning Execution by Multiple Listeners (REMUL) is a multi-party reinforcement learning approach.<n>REMUL builds on the hypothesis that reasoning traces which other parties can follow will be more faithful.<n>Speakers are rewarded for producing reasoning that is clear to listeners.
arXiv Detail & Related papers (2026-02-18T02:55:55Z)
VeriCoT: Neuro-symbolic Chain-of-Thought Validation via Logical Consistency Checks [18.68532103004733]
We introduce VeriCoT, a neuro-symbolic method that extracts and verifies formal logical arguments from Chain-of-Thought reasoning.<n>Experiments on the ProofWriter, LegalBench, and BioASQ datasets show VeriCoT effectively identifies flawed reasoning.<n>We also leverage VeriCoT's verification signal for (1) inference-time self-reflection, (2) supervised fine-tuning (SFT), and (3) preference fine-tuning.
arXiv Detail & Related papers (2025-11-06T18:50:08Z)
Explanation-Driven Counterfactual Testing for Faithfulness in Vision-Language Model Explanations [0.8657627742603715]
Vision-Language Models (VLMs) often produce fluent Natural Language Explanations (NLEs) that sound convincing but may not reflect causal factors driving predictions.<n>This mismatch of plausibility and faithfulness poses technical and governance risks.<n>We introduce Explanation-Driven Counterfactual Testing (EDCT), a fully automated verification procedure for a target VLM.
arXiv Detail & Related papers (2025-09-27T15:16:23Z)
ConfTuner: Training Large Language Models to Express Their Confidence Verbally [58.63318088243125]
Large Language Models (LLMs) are increasingly deployed in high-stakes domains such as science, law, and healthcare.<n>LLMs are often observed to generate incorrect answers with high confidence, a phenomenon known as "overconfidence"
arXiv Detail & Related papers (2025-08-26T09:25:32Z)
Unveiling Confirmation Bias in Chain-of-Thought Reasoning [12.150655660758359]
Chain-of-thought (CoT) prompting has been widely adopted to enhance the reasoning capabilities of large language models (LLMs)<n>This work presents a novel perspective to understand CoT behavior through the lens of textitconfirmation bias in cognitive psychology.
arXiv Detail & Related papers (2025-06-14T01:30:17Z)
Information Bargaining: Bilateral Commitment in Bayesian Persuasion [60.3761154043329]
We introduce a unified framework and a well-structured solution concept for long-term persuasion.<n>This perspective makes explicit the common knowledge of the game structure and grants the receiver comparable commitment capabilities.<n>The framework is validated through a two-stage validation-and-inference paradigm.
arXiv Detail & Related papers (2025-06-06T08:42:34Z)
Latent Veracity Inference for Identifying Errors in Stepwise Reasoning [78.29317733206643]
We introduce Veracity Search (VS), a discrete search algorithm over veracity assignments.<n>It performs otherwise intractable inference in the posterior distribution over latent veracity values.<n>It generalizes VS, enabling accurate zero-shot veracity inference in novel contexts.
arXiv Detail & Related papers (2025-05-17T04:16:36Z)
Self-Contradictory Reasoning Evaluation and Detection [31.452161594896978]
We investigate self-contradictory (Self-Contra) reasoning, where the model reasoning does not support its answers. We find that LLMs often contradict themselves in reasoning tasks involving contextual information understanding or commonsense. We find that GPT-4 can detect Self-Contra with a 52.2% F1 score, much lower compared to 66.7% for humans.
arXiv Detail & Related papers (2023-11-16T06:22:17Z)
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting [43.458726163197824]
Large Language Models (LLMs) can achieve strong performance on many tasks by producing step-by-step reasoning before giving a final output. We find that CoT explanations can systematically misrepresent the true reason for a model's prediction.
arXiv Detail & Related papers (2023-05-07T22:44:25Z)
Logical Satisfiability of Counterfactuals for Faithful Explanations in NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals. It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation. It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.