From Implicit Exploration to Structured Reasoning: Leveraging Guideline and Refinement for LLMs
- URL: http://arxiv.org/abs/2509.06284v1
- Date: Mon, 08 Sep 2025 02:11:49 GMT
- Title: From Implicit Exploration to Structured Reasoning: Leveraging Guideline and Refinement for LLMs
- Authors: Jiaxiang Chen, Zhuo Wang, Mingxi Zou, Zhucong Li, Zhijian Zhou, Song Wang, Zenglin Xu,
- Abstract summary: We propose a framework that shifts from implicit exploration to structured reasoning through guideline and refinement.<n>First, we extract structured reasoning patterns from successful trajectories and reflective signals from failures.<n>During inference, the model follows these guidelines step-by-step, with refinement applied after each step to correct errors and stabilize the reasoning process.
- Score: 33.17712742134723
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have advanced general-purpose reasoning, showing strong performance across diverse tasks. However, existing methods often rely on implicit exploration, where the model follows stochastic and unguided reasoning paths-like walking without a map. This leads to unstable reasoning paths, lack of error correction, and limited learning from past experience. To address these issues, we propose a framework that shifts from implicit exploration to structured reasoning through guideline and refinement. First, we extract structured reasoning patterns from successful trajectories and reflective signals from failures. During inference, the model follows these guidelines step-by-step, with refinement applied after each step to correct errors and stabilize the reasoning process. Experiments on BBH and four additional benchmarks (GSM8K, MATH-500, MBPP, HumanEval) show that our method consistently outperforms strong baselines across diverse reasoning tasks. Structured reasoning with stepwise execution and refinement improves stability and generalization, while guidelines transfer well across domains and flexibly support cross-model collaboration, matching or surpassing supervised fine-tuning in effectiveness and scalability.
Related papers
- On Multi-Step Theorem Prediction via Non-Parametric Structural Priors [50.16583672681106]
In this work, we explore training-free theorem prediction through the lens of in-context learning (ICL)<n>We propose Theorem Precedence Graphs, which encode temporal dependencies from historical solution traces as directed graphs, and impose explicit topological constraints that effectively prune the search space during inference.<n>Experiments on the FormalGeo7k benchmark show that our method achieves 89.29% accuracy, substantially outperforming ICL baselines and matching state-of-the-art supervised models.
arXiv Detail & Related papers (2026-03-05T06:08:50Z) - Learning Structured Reasoning via Tractable Trajectory Control [99.75278337895024]
Ctrl-R is a framework for learning structured reasoning via tractable trajectory control.<n>We show that Ctrl-R enables effective exploration and internalization of previously unattainable reasoning patterns.
arXiv Detail & Related papers (2026-03-02T09:18:19Z) - Dynamics Within Latent Chain-of-Thought: An Empirical Study of Causal Structure [58.89643769707751]
We study latent chain-of-thought as a manipulable causal process in representation space.<n>We find that latent-step budgets behave less like homogeneous extra depth and more like staged functionality with non-local routing.<n>These results motivate mode-conditional and stability-aware analyses as more reliable tools for interpreting and improving latent reasoning systems.
arXiv Detail & Related papers (2026-02-09T15:25:12Z) - Implicit Reasoning in Large Language Models: A Comprehensive Survey [67.53966514728383]
Large Language Models (LLMs) have demonstrated strong generalization across a wide range of tasks.<n>Recent studies have shifted attention from explicit chain-of-thought prompting toward implicit reasoning.<n>This survey introduces a taxonomy centered on execution paradigms, shifting the focus from representational forms to computational strategies.
arXiv Detail & Related papers (2025-09-02T14:16:02Z) - When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs [55.20230501807337]
We present the first systematic evaluation of 5 methods for improving prompt robustness within a unified experimental framework.<n>We benchmark these techniques on 8 models from Llama, Qwen and Gemma families across 52 tasks from Natural Instructions dataset.
arXiv Detail & Related papers (2025-08-15T10:32:50Z) - CTRLS: Chain-of-Thought Reasoning via Latent State-Transition [57.51370433303236]
Chain-of-thought (CoT) reasoning enables large language models to break down complex problems into interpretable intermediate steps.<n>We introduce groundingS, a framework that formulates CoT reasoning as a Markov decision process (MDP) with latent state transitions.<n>We show improvements in reasoning accuracy, diversity, and exploration efficiency across benchmark reasoning tasks.
arXiv Detail & Related papers (2025-07-10T21:32:18Z) - Efficient Post-Training Refinement of Latent Reasoning in Large Language Models [22.878147805601706]
Chain-of-Thought prompting suffers from sufficient token overhead and a fixed reasoning trajectory, preventing step-wise refinement.<n>Recent advances in latent reasoning address these limitations by refining internal reasoning processes directly in the model's latent space.<n>We propose a lightweight post-training framework that refines latent reasoning trajectories using two novel strategies.
arXiv Detail & Related papers (2025-06-10T08:17:16Z) - Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation [37.3874687615554]
We propose a framework that enhances language models (LLMs) reasoning by inducing structured reasoning strategies-called guidelines-from verified examples.<n>Our method draws on verified reasoning experiences by inducing reusable guidelines and expanding each into diverse variants.<n>Much like human reasoning, these variants reflect alternative thought patterns, are executed in parallel, refined via self-correction, and aggregated step by step.
arXiv Detail & Related papers (2025-06-09T14:46:31Z) - R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization [86.32257216965229]
We propose a new online reinforcement learning framework that enables MLLMs to self-improve reasoning ability via simple, effective and dense step-wise rewarding.<n>StepGRPO introduces two novel rule-based reasoning rewards: Step-wise Reasoning Accuracy Reward (StepRAR) and Step-wise Reasoning Validity Reward (StepRVR)<n>With the proposed StepGRPO, we introduce R1-VL, a series of MLLMs with outstanding capabilities in step-by-step reasoning.
arXiv Detail & Related papers (2025-03-17T08:51:44Z) - Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage.<n>Models may behave unreliably due to poorly explored failure modes.<n> causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z) - Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback [94.25162866972077]
Step-KTO is a training framework that combines process-level and outcome-level binary feedback.<n>Our experiments show that Step-KTO significantly improves both final answer accuracy and the quality of intermediate reasoning steps.
arXiv Detail & Related papers (2025-01-18T15:38:03Z) - Deductive Beam Search: Decoding Deducible Rationale for Chain-of-Thought Reasoning [10.86233584217013]
Previous methods fail to address reasoning errors in intermediate steps, leading to accumulative errors.
We propose Deductive Beam Search (DBS), which seamlessly integrates chain-of-thought reasoning with step-wise beam search for Large Language Models.
Our approach deploys a verifier, verifying the deducibility of a reasoning step and its premises, thus alleviating the error accumulation.
arXiv Detail & Related papers (2024-01-31T09:16:35Z) - SEER: Facilitating Structured Reasoning and Explanation via Reinforcement Learning [29.514755268807868]
We propose SEER, a novel method that maximizes a structure-based return to facilitate structured reasoning and explanation.
Our proposed structure-based return precisely describes the hierarchical and branching structure inherent in structured reasoning.
Our experiments show that SEER significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-01-24T06:10:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.