Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models
- URL: http://arxiv.org/abs/2502.19918v4
- Date: Fri, 08 Aug 2025 18:01:34 GMT
- Title: Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models
- Authors: Yuan Sui, Yufei He, Tri Cao, Simeng Han, Yulin Chen, Bryan Hooi,
- Abstract summary: Large Language Models (LLMs) struggle with high computational time and error propagation during inference time.<n>We propose Meta-Reasoner, a new framework to enable LLMs to optimize the inference compute by adjusting strategies on how to reason during inference time.<n>Our method improves performance by 9-12% over previous SOTA methods while reducing inference time by 28-35%.
- Score: 35.82665698868508
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large Language Models (LLMs) struggle with high computational time and error propagation during inference time, especially for complex tasks like math, puzzles, or coding requiring multi-step thinking. While existing reasoning models with chain-of-thoughts (CoT) can enable LLMs to do step-wise analysis and reflection, they often face the issue of wasting computation on less productive solutions and fail to make progress during inference time. In this paper, we propose Meta-Reasoner, a new framework to enable LLMs ``Think about how to think'', i.e., optimize the inference compute by adjusting strategies on how to reason during inference time. Inspired by dual-process theory, our method decouples the high-level strategy generation (e.g., backtracking, switching approaches, or restarting) from stepwise CoT generation via a lightweight progress report. The strategy module only consider the summarized version from the previous CoTs to propose new strategies accordingly. We employ the contextual multi-armed bandits (CMABs) for this module to iteratively evaluate the previous reasoning states and dynamically adjust the strategy to avoid reasoning get stuck in less productive paths during inference. Evaluations on math problems (e.g., Game-of-24, TheoremQA) and scientific problems (e.g., SciBench) demonstrate that our method improves performance by 9-12\% over previous SOTA methods while reducing inference time by 28-35\%. This approach also generalizes to other domains like creative writing, demonstrating its versatility for diverse reasoning-intensive problems using LLMs.
Related papers
- $\
abla$-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space [71.23672814629448]
$nabla$-Reasoner is an iterative generation framework that integrates differentiable optimization over token logits into the decoding loop.<n>$nabla$-Reasoner achieves over 20% accuracy improvement on a challenging mathematical reasoning benchmark.
arXiv Detail & Related papers (2026-03-05T08:42:54Z) - Encode, Think, Decode: Scaling test-time reasoning with recursive latent thoughts [19.518525241726916]
Encode-Think-Decode (ETD) is a method that enhances the reasoning capabilities of a base model by training it to iterate over a small subset of reasoning-relevant layers during the mid-training stage.<n>ETD models yield substantial gains on 17 reasoning benchmarks, including +28.4% relative accuracy improvement on GSM8K and +36% on MATH with the OLMo-2 1B Base model.
arXiv Detail & Related papers (2025-10-08T15:58:35Z) - Plan before Solving: Problem-Aware Strategy Routing for Mathematical Reasoning with LLMs [49.995906301946]
Existing methods usually leverage a fixed strategy to guide Large Language Models (LLMs) to perform mathematical reasoning.<n>Our analysis reveals that the single strategy cannot adapt to problem-specific requirements and thus overlooks the trade-off between effectiveness and efficiency.<n>We propose Planning and Routing through Instance-Specific Modeling (PRISM), a novel framework that decouples mathematical reasoning into two stages: strategy planning and targeted execution.
arXiv Detail & Related papers (2025-09-29T07:22:41Z) - Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning [78.17782197231325]
We propose a reasoning-guided reinforcement learning strategy that aligns the extractor's captioning behavior with the reasoning objective.<n> Experiments on multi-modal math and science benchmarks show that the proposed RACRO method achieves state-of-the-art average performance.
arXiv Detail & Related papers (2025-06-05T02:28:07Z) - PATS: Process-Level Adaptive Thinking Mode Switching [53.53401063490537]
Current large-language models (LLMs) typically adopt a fixed reasoning strategy, either simple or complex, for all questions, regardless of their difficulty.<n>This neglect of variation in task and reasoning process complexity leads to an imbalance between performance and efficiency.<n>Existing methods attempt to implement training-free fast-slow thinking system switching to handle problems of varying difficulty, but are limited by coarse-grained solution-level strategy adjustments.<n>We propose a novel reasoning paradigm: Process-Level Adaptive Thinking Mode Switching (PATS), which enables LLMs to dynamically adjust their reasoning strategy based on the difficulty of each step, optimizing the balance between
arXiv Detail & Related papers (2025-05-25T17:58:50Z) - A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law [29.763080554625216]
This survey explores recent advancements in reasoning large language models (LLMs) designed to mimic "slow thinking"<n>LLMs focus on scaling computational resources dynamically during complex tasks, such as math reasoning, visual reasoning, medical diagnosis, and multi-agent debates.
arXiv Detail & Related papers (2025-05-05T14:14:59Z) - Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods [39.89239733570008]
This work conducts a comprehensive analysis of inference-time scaling methods for both reasoning and non-reasoning models.<n>We find that non-reasoning models, even with an extremely high inference budget, still fall substantially behind reasoning models.<n>For reasoning models, majority voting proves to be a robust inference strategy, generally competitive or outperforming other more sophisticated ITC methods.
arXiv Detail & Related papers (2025-04-18T19:32:55Z) - Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1) [66.51642638034822]
Reasoning is central to human intelligence, enabling structured problem-solving across diverse tasks.
Recent advances in large language models (LLMs) have greatly enhanced their reasoning abilities in arithmetic, commonsense, and symbolic domains.
This paper offers a concise yet insightful overview of reasoning techniques in both textual and multimodal LLMs.
arXiv Detail & Related papers (2025-04-04T04:04:56Z) - A Survey of Scaling in Large Language Model Reasoning [62.92861523305361]
We provide a comprehensive examination of scaling in large Language models (LLMs) reasoning.
We analyze scaling in reasoning steps that improves multi-step inference and logical consistency.
We discuss scaling in training-enabled reasoning, focusing on optimization through iterative model improvement.
arXiv Detail & Related papers (2025-04-02T23:51:27Z) - "Well, Keep Thinking": Enhancing LLM Reasoning with Adaptive Injection Decoding [4.008780119020479]
Large language models (LLMs) exhibit strong reasoning abilities, often attributed to few-shot or zero-shot chain-of-thought (CoT) prompting.
We propose a novel decoding strategy that systematically nudges LLMs to continue reasoning, thereby preventing immature reasoning processes.
arXiv Detail & Related papers (2025-03-13T08:46:32Z) - ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning [54.787341008881036]
We introduce Reinforced Meta-thinking Agents (ReMA), a novel framework that leverages Multi-Agent Reinforcement Learning (MARL) to elicit meta-thinking behaviors.
ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions.
Experimental results demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks.
arXiv Detail & Related papers (2025-03-12T16:05:31Z) - Advancing Reasoning in Large Language Models: Promising Methods and Approaches [0.0]
Large Language Models (LLMs) have succeeded remarkably in various natural language processing (NLP) tasks.<n>Their ability to perform complex reasoning-spanning logical deduction, mathematical problem-solving, commonsense inference, and multi-step reasoning-often falls short of human expectations.<n>This survey provides a comprehensive review of emerging techniques enhancing reasoning in LLMs.
arXiv Detail & Related papers (2025-02-05T23:31:39Z) - Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs [76.43407125275202]
o1-like models can emulate human-like long-time thinking during inference.<n>This paper presents the first comprehensive study on the prevalent issue of overthinking in these models.<n>We propose strategies to mitigate overthinking, streamlining reasoning processes without compromising accuracy.
arXiv Detail & Related papers (2024-12-30T18:55:12Z) - Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning [40.069109287947875]
We propose a novel reasoning framework called Forest-of-Thought (FoT)<n>FoT integrates multiple reasoning trees to leverage collective decision-making for solving complex logical problems.<n>We introduce a dynamic self-correction strategy that enables real-time error correction, along with consensus-guided decision-making strategies.
arXiv Detail & Related papers (2024-12-12T09:01:18Z) - Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up [9.42385235462794]
Large language models (LLMs) have shown remarkable performance in reasoning tasks but face limitations in mathematical and complex logical reasoning.<n>We propose Reversal of Thought (RoT) to enhance the logical reasoning abilities of LLMs during the warm-up phase prior to batch inference.<n>RoT utilizes a Preference-Guided Reverse Reasoning warm-up strategy, which integrates logical symbols for pseudocode planning.
arXiv Detail & Related papers (2024-10-16T07:44:28Z) - Can We Further Elicit Reasoning in LLMs? Critic-Guided Planning with Retrieval-Augmentation for Solving Challenging Tasks [68.49251303172674]
State-of-the-art large language models (LLMs) exhibit impressive problem-solving capabilities but may struggle with complex reasoning and factual correctness.
Existing methods harness the strengths of chain-of-thought and retrieval-augmented generation (RAG) to decompose a complex problem into simpler steps and apply retrieval to improve factual correctness.
We introduce Critic-guided planning with Retrieval-augmentation, CR-Planner, a novel framework that leverages fine-tuned critic models to guide both reasoning and retrieval processes through planning.
arXiv Detail & Related papers (2024-10-02T11:26:02Z) - Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models [46.959380978972206]
We study inference scaling laws (aka test-time scaling laws) and compute-optimal inference.<n>As a first step towards understanding and designing compute-optimal inference methods, we studied cost-performance trade-offs for inference strategies.<n>Our findings suggest that scaling inference compute with inference strategies can be more computationally efficient than scaling model parameters.
arXiv Detail & Related papers (2024-08-01T17:16:04Z) - MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.<n>We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.<n>Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z) - Generating Chain-of-Thoughts with a Pairwise-Comparison Approach to Searching for the Most Promising Intermediate Thought [70.30423016640749]
Chain-of-thoughts (CoT) methods were proposed to guide large language models to reason step-by-step, enabling problem solving from simple to complex.
The evaluation from the large language model (LLMs) is typically noisy and unreliable, potentially misleading the generation process in selecting promising intermediate thoughts.
In this paper, motivated by Vapnik's principle, we use pairwise-comparison evaluation instead of point-wise scoring to search for promising intermediate thoughts.
arXiv Detail & Related papers (2024-02-10T09:51:03Z) - Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing [61.98556945939045]
We propose a framework to learn planning-based reasoning through Direct Preference Optimization (DPO) on collected trajectories.
Our results on challenging logical reasoning benchmarks demonstrate the effectiveness of our learning framework.
arXiv Detail & Related papers (2024-02-01T15:18:33Z) - SCREWS: A Modular Framework for Reasoning with Revisions [58.698199183147935]
We present SCREWS, a modular framework for reasoning with revisions.
We show that SCREWS unifies several previous approaches under a common framework.
We evaluate our framework with state-of-the-art LLMs on a diverse set of reasoning tasks.
arXiv Detail & Related papers (2023-09-20T15:59:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.