Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation
- URL: http://arxiv.org/abs/2506.07820v2
- Date: Tue, 10 Jun 2025 02:05:49 GMT
- Title: Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation
- Authors: Jiaxiang Chen, Zhuo Wang, Mingxi Zou, Qifan Wang, Zenglin Xu,
- Abstract summary: We propose a framework that enhances language models (LLMs) reasoning by inducing structured reasoning strategies-called guidelines-from verified examples.<n>Our method draws on verified reasoning experiences by inducing reusable guidelines and expanding each into diverse variants.<n>Much like human reasoning, these variants reflect alternative thought patterns, are executed in parallel, refined via self-correction, and aggregated step by step.
- Score: 37.3874687615554
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human reasoning is flexible, adaptive, and grounded in prior experience-qualities that large language models (LLMs) still struggle to emulate. Existing methods either explore diverse reasoning paths at inference time or search for optimal workflows through expensive operations, but both fall short in leveraging multiple reusable strategies in a structured, efficient manner. We propose Guideline Forest, a framework that enhances LLMs reasoning by inducing structured reasoning strategies-called guidelines-from verified examples and executing them via step-wise aggregation. Unlike test-time search or single-path distillation, our method draws on verified reasoning experiences by inducing reusable guidelines and expanding each into diverse variants. Much like human reasoning, these variants reflect alternative thought patterns, are executed in parallel, refined via self-correction, and aggregated step by step-enabling the model to adaptively resolve uncertainty and synthesize robust solutions.We evaluate Guideline Forest on four benchmarks-GSM8K, MATH-500, MBPP, and HumanEval-spanning mathematical and programmatic reasoning. Guideline Forest consistently outperforms strong baselines, including CoT, ReAct, ToT, FoT, and AFlow. Ablation studies further highlight the effectiveness of multi-path reasoning and stepwise aggregation, underscoring the Guideline Forest's adaptability and generalization potential.
Related papers
- SCULPT: Constraint-Guided Pruned MCTS that Carves Efficient Paths for Mathematical Reasoning [11.991985041067638]
This paper introduces SCULPT, a constraint-guided approach for Monte Carlo Tree Search (MCTS)<n>SCULPT scores and prunes actions using a combination of symbolic checks (dimensional consistency, type compatibility, magnitude sanity, depth control, and diversity) and structural pattern guidance.<n>Overall, domain-aware constraints can improve accuracy while maintaining efficiency and stability.
arXiv Detail & Related papers (2026-01-19T08:55:46Z) - Enhancing Self-Correction in Large Language Models through Multi-Perspective Reflection [0.33625320078410365]
MyGO Poly-Reflective Chain-of-Thought (PR-CoT) is a novel methodology employing structured multi-perspective reflection.<n>It refines the initial CoT into a more robust and accurate final answer without model retraining.<n>It significantly outperforms traditional CoT and existing reflection methods in logical consistency and error correction.
arXiv Detail & Related papers (2026-01-12T17:57:05Z) - Latent Chain-of-Thought for Visual Reasoning [53.541579327424046]
Chain-of-thought (CoT) reasoning is critical for improving the interpretability and reliability of Large Vision-Language Models (LVLMs)<n>We reformulate reasoning in LVLMs as posterior inference and propose a scalable training algorithm based on amortized variational inference.<n>We empirically demonstrate that the proposed method enhances the state-of-the-art LVLMs on seven reasoning benchmarks.
arXiv Detail & Related papers (2025-10-27T23:10:06Z) - A Survey on Parallel Reasoning [58.66122129692264]
We first present a formal definition of parallel reasoning and clarify its distinction from related concepts like Chain-of-Thought.<n>We then organize and discuss advanced techniques based on a novel taxonomy, including non-interactive reasoning, interactive reasoning, and efficiency-focused decoding strategies.<n>We highlight the core challenges of parallel reasoning and suggest potential directions for future research.
arXiv Detail & Related papers (2025-10-14T05:42:19Z) - Thinking in Many Modes: How Composite Reasoning Elevates Large Language Model Performance with Limited Data [1.7194419006128259]
Composite Reasoning (CR) is a novel reasoning approach empowering Large Language Models (LLMs) to explore and combine multiple reasoning styles.<n> evaluated on scientific and medical question-answering benchmarks.<n>Our findings highlight that by cultivating internal reasoning style diversity, LLMs acquire more robust, adaptive, and efficient problem-solving abilities.
arXiv Detail & Related papers (2025-09-26T11:38:03Z) - From Implicit Exploration to Structured Reasoning: Leveraging Guideline and Refinement for LLMs [33.17712742134723]
We propose a framework that shifts from implicit exploration to structured reasoning through guideline and refinement.<n>First, we extract structured reasoning patterns from successful trajectories and reflective signals from failures.<n>During inference, the model follows these guidelines step-by-step, with refinement applied after each step to correct errors and stabilize the reasoning process.
arXiv Detail & Related papers (2025-09-08T02:11:49Z) - When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs [55.20230501807337]
We present the first systematic evaluation of 5 methods for improving prompt robustness within a unified experimental framework.<n>We benchmark these techniques on 8 models from Llama, Qwen and Gemma families across 52 tasks from Natural Instructions dataset.
arXiv Detail & Related papers (2025-08-15T10:32:50Z) - Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute [57.16286134405821]
We propose Fractional Reasoning, a framework that enables continuous control over reasoning intensity at inference time.<n>Our method operates by extracting the latent steering vector associated with deeper reasoning and reapplying it with a tunable scaling factor.<n> Experiments on GSM8K, MATH500, and GPQA demonstrate that Fractional Reasoning consistently improves performance across diverse reasoning tasks and models.
arXiv Detail & Related papers (2025-06-18T21:15:59Z) - RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling [25.12721060984898]
Rule-based reasoning has been acknowledged as one of the fundamental problems in reasoning.<n>We introduce Reinforced Rule-based Reasoning, a.k.a. RuleReasoner, a simple yet effective method to conduct rule-based reasoning.<n>Specifically, RuleReasoner resamples each training batch by updating the sampling weights of different domains based on historical rewards.
arXiv Detail & Related papers (2025-06-10T10:31:21Z) - Ada-R1: Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization [86.56120216550232]
We propose a novel two-stage framework for adaptive and efficient reasoning.<n>First, we construct a hybrid reasoning model by merging long and short CoT models.<n>Second, we apply bi-level preference training to guide the model to select suitable reasoning styles.
arXiv Detail & Related papers (2025-04-30T14:01:45Z) - R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization [86.32257216965229]
We propose a new online reinforcement learning framework that enables MLLMs to self-improve reasoning ability via simple, effective and dense step-wise rewarding.<n>StepGRPO introduces two novel rule-based reasoning rewards: Step-wise Reasoning Accuracy Reward (StepRAR) and Step-wise Reasoning Validity Reward (StepRVR)<n>With the proposed StepGRPO, we introduce R1-VL, a series of MLLMs with outstanding capabilities in step-by-step reasoning.
arXiv Detail & Related papers (2025-03-17T08:51:44Z) - Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models [35.82665698868508]
Large Language Models increasingly rely on prolonged reasoning chains to solve complex tasks.<n>This trial-and-error approach often leads to high computational overhead and error propagation.<n>We introduce Meta-Reasoner, a framework that dynamically optimize inference-time reasoning.
arXiv Detail & Related papers (2025-02-27T09:40:13Z) - AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence [29.551802573731305]
We propose AdaptiveStep, a method that divides reasoning steps based on the model's confidence in predicting the next word.<n>We demonstrate its effectiveness through experiments with AdaptiveStep-trained PRMs in mathematical reasoning and code generation tasks.
arXiv Detail & Related papers (2025-02-19T18:35:55Z) - Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning [40.069109287947875]
We propose a novel reasoning framework called Forest-of-Thought (FoT)<n>FoT integrates multiple reasoning trees to leverage collective decision-making for solving complex logical problems.<n>FoT employs sparse activation strategies to select the most relevant reasoning paths, improving both efficiency and accuracy.
arXiv Detail & Related papers (2024-12-12T09:01:18Z) - PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking [0.0]
PRefLexOR combines preference optimization with concepts from Reinforcement Learning to enable models to self-teach.
We focus on applications in biological materials science and demonstrate the method in a variety of case studies.
arXiv Detail & Related papers (2024-10-16T08:46:26Z) - Unlocking the Capabilities of Thought: A Reasoning Boundary Framework to Quantify and Optimize Chain-of-Thought [61.588465852846646]
Chain-of-Thought (CoT) reasoning has emerged as a promising approach for enhancing the performance of large language models (LLMs)
In this work, we introduce a novel reasoning boundary framework (RBF) to address these challenges.
arXiv Detail & Related papers (2024-10-08T05:26:28Z) - LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [56.273799410256075]
The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path.
The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability.
arXiv Detail & Related papers (2024-10-03T18:12:29Z) - Let's reward step by step: Step-Level reward model as the Navigators for
Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase.
We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs.
To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.