Related papers: VeriCoT: Neuro-symbolic Chain-of-Thought Validation via Logical Consistency Checks

VeriCoT: Neuro-symbolic Chain-of-Thought Validation via Logical Consistency Checks

URL: http://arxiv.org/abs/2511.04662v1
Date: Thu, 06 Nov 2025 18:50:08 GMT
Title: VeriCoT: Neuro-symbolic Chain-of-Thought Validation via Logical Consistency Checks
Authors: Yu Feng, Nathaniel Weir, Kaj Bostrom, Sam Bayless, Darion Cassel, Sapana Chaudhary, Benjamin Kiesl-Reiter, Huzefa Rangwala,
Abstract summary: We introduce VeriCoT, a neuro-symbolic method that extracts and verifies formal logical arguments from Chain-of-Thought reasoning.<n>Experiments on the ProofWriter, LegalBench, and BioASQ datasets show VeriCoT effectively identifies flawed reasoning.<n>We also leverage VeriCoT's verification signal for (1) inference-time self-reflection, (2) supervised fine-tuning (SFT), and (3) preference fine-tuning.
Score: 18.68532103004733
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: LLMs can perform multi-step reasoning through Chain-of-Thought (CoT), but they cannot reliably verify their own logic. Even when they reach correct answers, the underlying reasoning may be flawed, undermining trust in high-stakes scenarios. To mitigate this issue, we introduce VeriCoT, a neuro-symbolic method that extracts and verifies formal logical arguments from CoT reasoning. VeriCoT formalizes each CoT reasoning step into first-order logic and identifies premises that ground the argument in source context, commonsense knowledge, or prior reasoning steps. The symbolic representation enables automated solvers to verify logical validity while the NL premises allow humans and systems to identify ungrounded or fallacious reasoning steps. Experiments on the ProofWriter, LegalBench, and BioASQ datasets show VeriCoT effectively identifies flawed reasoning, and serves as a strong predictor of final answer correctness. We also leverage VeriCoT's verification signal for (1) inference-time self-reflection, (2) supervised fine-tuning (SFT) on VeriCoT-distilled datasets and (3) preference fine-tuning (PFT) with direct preference optimization (DPO) using verification-based pairwise rewards, further improving reasoning validity and accuracy.

Related papers

Uncovering Hidden Correctness in LLM Causal Reasoning via Symbolic Verification [56.51953062869371]
DoVerifier is a symbolic verifier that checks whether causal expressions are derivable from a given causal graph using rules from do-calculus and probability theory.<n>Our evaluations on synthetic data and causal QA benchmarks show that DoVerifier more accurately captures semantic correctness of causal reasoning traces.
arXiv Detail & Related papers (2026-01-29T03:22:58Z)
VERGE: Formal Refinement and Guidance Engine for Verifiable LLM Reasoning [4.3414302048068745]
We present a neurosymbolic framework that combines Large Language Models with SMT solvers to produce verification-guided answers.<n>We introduce three key innovations: (1) multi-model consensus via formal semantic equivalence checking, (2) semantic routing that directs different claim types to appropriate verification strategies, and (3) precise logical error localization via Minimal Correction Subsets.<n>With the GPT-OSS-120B model, VERGE demonstrates an average performance uplift of 18.7% at convergence across a set of reasoning benchmarks compared to single-pass approaches.
arXiv Detail & Related papers (2026-01-27T20:59:11Z)
Training LLMs with LogicReward for Faithful and Rigorous Reasoning [75.30425553246177]
We propose LogicReward, a reward system that guides model training by enforcing step-level logical correctness with a theorem prover.<n>An 8B model trained on data constructed with LogicReward surpasses GPT-4o and o4-mini by 11.6% and 2% on natural language inference and logical reasoning tasks.
arXiv Detail & Related papers (2025-12-20T03:43:02Z)
MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models [49.32415342913976]
We introduce MM-CoT, a diagnostic benchmark designed to probe the visual grounding and logical coherence of CoT reasoning in multimodal models.<n>We evaluate leading vision-language models on MM-CoT and find that even the most advanced systems struggle, revealing a sharp discrepancy between generative fluency and true reasoning fidelity.
arXiv Detail & Related papers (2025-12-09T04:13:31Z)
Critical or Compliant? The Double-Edged Sword of Reasoning in Chain-of-Thought Explanations [60.27156500679296]
We study the role of Chain-of-Thought (CoT) explanations in moral scenarios by systematically perturbing reasoning chains and manipulating delivery tones.<n>Our findings reveal two key effects: (1) users often trust with outcome agreement, sustaining reliance even when reasoning is flawed.<n>These results highlight how CoT explanations can simultaneously clarify and mislead, underscoring the need for NLP systems to provide explanations that encourage scrutiny and critical thinking rather than blind trust.
arXiv Detail & Related papers (2025-11-15T02:38:49Z)
SemCoT: Accelerating Chain-of-Thought Reasoning through Semantically-Aligned Implicit Tokens [43.78883511257627]
Chain-of-Thought (CoT) reasoning hinders its mass deployment in efficiency-critical applications.<n>We propose a novel semantically-aligned implicit CoT framework termed SemCoT.
arXiv Detail & Related papers (2025-10-28T20:11:54Z)
SIM-CoT: Supervised Implicit Chain-of-Thought [108.30049193668083]
Implicit Chain-of-Thought (CoT) methods offer a token-efficient alternative to explicit CoT reasoning in Large Language Models.<n>We identify a core latent instability issue when scaling the computational budget of implicit CoT.<n>We propose SIM-CoT, a plug-and-play training module that introduces step-level supervision to stabilize and enrich the latent reasoning space.
arXiv Detail & Related papers (2025-09-24T17:01:32Z)
Unveiling Confirmation Bias in Chain-of-Thought Reasoning [12.150655660758359]
Chain-of-thought (CoT) prompting has been widely adopted to enhance the reasoning capabilities of large language models (LLMs)<n>This work presents a novel perspective to understand CoT behavior through the lens of textitconfirmation bias in cognitive psychology.
arXiv Detail & Related papers (2025-06-14T01:30:17Z)
Thought-Path Contrastive Learning via Premise-Oriented Data Augmentation for Logical Reading Comprehension [9.67774998354062]
Previous research has primarily focused on enhancing logical reasoning capabilities through Chain-of-Thought (CoT) or data augmentation.<n>We propose a Premise-Oriented Data Augmentation (PODA) framework to generate CoT rationales including analyses for both correct and incorrect options.<n>We also introduce a novel thought-path contrastive learning method that compares reasoning paths between the original and counterfactual samples.
arXiv Detail & Related papers (2024-09-22T15:44:43Z)
Mitigating Misleading Chain-of-Thought Reasoning with Selective Filtering [59.495717939664246]
Large language models have manifested remarkable capabilities by leveraging chain-of-thought (CoT) reasoning techniques to solve intricate questions. We propose a novel approach called the selective filtering reasoner (SelF-Reasoner) that assesses the entailment relationship between the question and the candidate reasoning chain. SelF-Reasoner improves the fine-tuned T5 baseline consistently over the ScienceQA, ECQA, and LastLetter tasks.
arXiv Detail & Related papers (2024-03-28T06:28:35Z)
Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs [95.07757789781213]
Two lines of approaches are adopted for complex reasoning with LLMs.<n>One line of work prompts LLMs with various reasoning structures, while the structural outputs can be naturally regarded as intermediate reasoning steps.<n>The other line of work adopt LLM-free declarative solvers to do the reasoning task, rendering higher reasoning accuracy but lacking interpretability due to the black-box nature of the solvers.<n>We present a simple extension to the latter line of work. Specifically, we showcase that the intermediate search logs generated by Prolog interpreters can be accessed and interpreted into human-readable reasoning.
arXiv Detail & Related papers (2023-11-16T11:26:21Z)
LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers [60.009969929857704]
Logical reasoning is an important task for artificial intelligence with potential impacts on science, mathematics, and society. In this work, we reformulating such tasks as modular neurosymbolic programming, which we call LINC. We observe significant performance gains on FOLIO and a balanced subset of ProofWriter for three different models in nearly all experimental conditions we evaluate.
arXiv Detail & Related papers (2023-10-23T17:58:40Z)
Logical Credal Networks [87.25387518070411]
This paper introduces Logical Credal Networks, an expressive probabilistic logic that generalizes many prior models that combine logic and probability. We investigate its performance on maximum a posteriori inference tasks, including solving Mastermind games with uncertainty and detecting credit card fraud.
arXiv Detail & Related papers (2021-09-25T00:00:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.