RAVR: Reference-Answer-guided Variational Reasoning for Large Language Models
- URL: http://arxiv.org/abs/2510.25206v1
- Date: Wed, 29 Oct 2025 06:18:37 GMT
- Title: RAVR: Reference-Answer-guided Variational Reasoning for Large Language Models
- Authors: Tianqianjin Lin, Xi Zhao, Xingyao Zhang, Rujiao Long, Yi Xu, Zhuoren Jiang, Wenbo Su, Bo Zheng,
- Abstract summary: We introduce RAVR, an end-to-end framework that uses answer-conditioned reasoning as a variational surrogate for question-only reasoning.<n> RAVR reduces hesitation, strengthens conclusion consolidation, and promotes problem-specific strategies in reasoning.
- Score: 21.671577399379885
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) can refine the reasoning abilities of large language models (LLMs), but critically depends on a key prerequisite: the LLM can already generate high-utility reasoning paths with non-negligible probability. For tasks beyond the LLM's current competence, such reasoning path can be hard to sample, and learning risks reinforcing familiar but suboptimal reasoning. We are motivated by the insight from cognitive science that Why is this the answer is often an easier question than What is the answer, as it avoids the heavy cognitive load of open-ended exploration, opting instead for explanatory reconstruction-systematically retracing the reasoning that links a question to its answer. We show that LLMs can similarly leverage answers to derive high-quality reasoning paths. We formalize this phenomenon and prove that conditioning on answer provably increases the expected utility of sampled reasoning paths, thereby transforming intractable problems into learnable ones. Building on this insight, we introduce RAVR (Reference-Answer-guided Variational Reasoning), an end-to-end framework that uses answer-conditioned reasoning as a variational surrogate for question-only reasoning. Experiments in both general and math domains demonstrate consistent improvements over strong baselines. We further analyze the reasoning behavior and find that RAVR reduces hesitation, strengthens conclusion consolidation, and promotes problem-specific strategies in reasoning.
Related papers
- CausalFlip: A Benchmark for LLM Causal Judgment Beyond Semantic Matching [50.65932158912512]
We propose a new causal reasoning benchmark, CausalFlip, to encourage the development of new large language models.<n>CaulFlip consists of causal judgment questions built over event triples that could form different confounder, chain, and collider relations.<n>We evaluate LLMs under multiple training paradigms, including answer-only training, explicit Chain-of-Thought supervision, and a proposed internalized causal reasoning approach.
arXiv Detail & Related papers (2026-02-23T18:06:15Z) - Towards Generalizable Reasoning: Group Causal Counterfactual Policy Optimization for LLM Reasoning [50.352417879912515]
Large language models (LLMs) excel at complex tasks with advances in reasoning capabilities.<n>We propose Group Causal Counterfactual Policy Optimization to explicitly train LLMs to learn generalizable reasoning patterns.<n>We then construct token-level advantages from this reward and optimize the policy, encouraging LLMs to favor reasoning patterns that are process-valid and counterfactually robust.
arXiv Detail & Related papers (2026-02-06T08:03:11Z) - Making Mathematical Reasoning Adaptive [61.45161826629692]
We propose the AdaR framework to enable adaptive reasoning in large language models (LLMs)<n>AdaR synthesizes logically equivalent queries by varying variable values, and trains models with RLVR on these data to penalize spurious logic.<n> Experimental results demonstrate that AdaR improves robustness and generalization, achieving substantial improvement in mathematical reasoning.
arXiv Detail & Related papers (2025-10-06T09:30:05Z) - Answer-Centric or Reasoning-Driven? Uncovering the Latent Memory Anchor in LLMs [28.556628696390767]
Large Language Models (LLMs) demonstrate impressive reasoning capabilities.<n>Evidence suggests much of their success stems from memorized answer-reasoning patterns rather than genuine inference.<n>We propose a five-level answer-visibility prompt framework that systematically manipulates answer cues and probes model behavior through indirect, behavioral analysis.
arXiv Detail & Related papers (2025-06-21T08:15:45Z) - Don't "Overthink" Passage Reranking: Is Reasoning Truly Necessary? [60.725923225442095]
We compare reasoning-based pointwise rerankers (ReasonRR) to standard, non-reasoning pointwise rerankers (StandardRR) under identical training conditions.<n>We find that ReasonRR-NoReason is surprisingly more effective than ReasonRR.
arXiv Detail & Related papers (2025-05-22T16:41:37Z) - LINGOLY-TOO: Disentangling Reasoning from Knowledge with Templatised Orthographic Obfuscation [1.2576388595811496]
We introduce LINGOLY-TOO, a challenging reasoning benchmark grounded in natural language.<n>We permute reasoning problems written in real languages to generate numerous question variations.<n>Experiments and analyses show that models can circumvent reasoning and answer from prior knowledge.
arXiv Detail & Related papers (2025-03-04T19:57:47Z) - SR-FoT: A Syllogistic-Reasoning Framework of Thought for Large Language Models Tackling Knowledge-based Reasoning Tasks [42.392103712958445]
Large Language Models (LLMs) might not follow the correct reasoning paths.<n>We propose a multi-stage Syllogistic-Reasoning Framework of Thought (SR-FoT)<n>Our SR-FoT begins by interpreting the question and then uses the interpretation and the original question to propose a suitable major premise.
arXiv Detail & Related papers (2025-01-20T17:00:41Z) - Make LLMs better zero-shot reasoners: Structure-orientated autonomous reasoning [52.83539473110143]
We introduce a novel structure-oriented analysis method to help Large Language Models (LLMs) better understand a question.
To further improve the reliability in complex question-answering tasks, we propose a multi-agent reasoning system, Structure-oriented Autonomous Reasoning Agents (SARA)
Extensive experiments verify the effectiveness of the proposed reasoning system. Surprisingly, in some cases, the system even surpasses few-shot methods.
arXiv Detail & Related papers (2024-10-18T05:30:33Z) - Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models [84.15513004135576]
Current research enhances the reasoning performance of Large Language Models (LLMs) by sampling multiple reasoning chains and ensembling based on the answer frequency.
This approach fails in scenarios where the correct answers are in the minority.
We introduce a hierarchical reasoning aggregation framework AoR, which selects answers based on the evaluation of reasoning chains.
arXiv Detail & Related papers (2024-05-21T17:12:19Z) - How Likely Do LLMs with CoT Mimic Human Reasoning? [31.86489714330338]
Chain-of-thought emerges as a promising technique for eliciting reasoning capabilities from Large Language Models (LLMs)<n>We use causal analysis to understand the relationships between the problem instruction, reasoning, and the answer in LLMs.
arXiv Detail & Related papers (2024-02-25T10:13:04Z) - Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs [55.66353783572259]
Causal-Consistency Chain-of-Thought harnesses multi-agent collaboration to bolster the faithfulness and causality of foundation models.<n>Our framework demonstrates significant superiority over state-of-the-art methods through extensive and comprehensive evaluations.
arXiv Detail & Related papers (2023-08-23T04:59:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.