Accelerating Large Language Model Reasoning via Speculative Search
- URL: http://arxiv.org/abs/2505.02865v2
- Date: Sat, 24 May 2025 01:49:29 GMT
- Title: Accelerating Large Language Model Reasoning via Speculative Search
- Authors: Zhihai Wang, Jie Wang, Jilai Pan, Xilin Xia, Huiling Zhen, Mingxuan Yuan, Jianye Hao, Feng Wu,
- Abstract summary: We propose a novel Speculative Search (SpecSearch) framework that significantly accelerates large language models (LLMs) reasoning.<n>Specifically, SpecSearch utilizes a small model to strategically collaborate with a large model at both thought and token levels.<n>The major pillar of SpecSearch is a novel quality-preserving rejection mechanism, which effectively filters out thoughts whose quality falls below that of the large model's outputs.
- Score: 59.48276891032373
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tree-search-based reasoning methods have significantly enhanced the reasoning capability of large language models (LLMs) by facilitating the exploration of multiple intermediate reasoning steps, i.e., thoughts. However, these methods suffer from substantial inference latency, as they have to generate numerous reasoning thoughts, severely limiting LLM applicability. To address this challenge, we propose a novel Speculative Search (SpecSearch) framework that significantly accelerates LLM reasoning by optimizing thought generation. Specifically, SpecSearch utilizes a small model to strategically collaborate with a large model at both thought and token levels, efficiently generating high-quality reasoning thoughts. The major pillar of SpecSearch is a novel quality-preserving rejection mechanism, which effectively filters out thoughts whose quality falls below that of the large model's outputs. Moreover, we show that SpecSearch preserves comparable reasoning quality to the large model. Experiments on both the Qwen and Llama models demonstrate that SpecSearch significantly outperforms state-of-the-art approaches, achieving up to 2.12$\times$ speedup with comparable reasoning quality.
Related papers
- Thinking with Nothinking Calibration: A New In-Context Learning Paradigm in Reasoning Large Language Models [23.642200042199484]
We propose Thinking with Nothinking (JointThinking) as an in-context learning (ICL) paradigm for Reasoning large language models (RLLMs)<n>Our method prompts the model to generate two answers in parallel: one in Thinking mode and the other in Nothinking mode.<n>JointThinking significantly outperforms few-shot chain-of-thought robustness (CoT) and majority voting with improved answer.
arXiv Detail & Related papers (2025-08-05T12:09:55Z) - Don't Overthink It: A Survey of Efficient R1-style Large Reasoning Models [49.598776427454176]
Large Reasoning Models (LRMs) have gradually become a research hotspot due to their outstanding performance in handling complex tasks.<n>However, with the widespread application of these models, the problem of overthinking has gradually emerged.<n>Various efficient reasoning methods have been proposed, aiming to reduce the length of reasoning paths without compromising model performance and reasoning capability.
arXiv Detail & Related papers (2025-08-04T06:54:31Z) - Lost at the Beginning of Reasoning [82.18834329384514]
We show that the first reasoning step exerts a disproportionately large influence on the final prediction.<n>We propose an efficient sampling strategy that leverages a reward model to identify and retain high-quality first reasoning steps.<n>We introduce a new benchmark specifically constructed with deliberately flawed first reasoning steps to systematically evaluate model self-correction capabilities.
arXiv Detail & Related papers (2025-06-27T09:53:57Z) - A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings [64.36404136352287]
A*-Thought is an efficient tree search-based unified framework designed to identify and isolate the most essential thoughts.<n>It formulates the reasoning process of LRMs as a search tree, where each node represents a reasoning span in the giant reasoning space.<n>It can improve the performance of QwQ-32B by 2.39$times$ with low-budget and reduce the length of the output token by nearly 50% with high-budget.
arXiv Detail & Related papers (2025-05-30T12:58:34Z) - Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning [27.498043430208085]
Excessive reliance on chain-of-thought (CoT) reasoning can impair model performance.<n>We propose Certainty-based Adaptive Reasoning (CAR)<n>CAR switches between short answers and long-form reasoning based on the model perplexity.
arXiv Detail & Related papers (2025-05-21T06:20:17Z) - Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods [39.89239733570008]
This work conducts a comprehensive analysis of inference-time scaling methods for both reasoning and non-reasoning models.<n>We find that non-reasoning models, even with an extremely high inference budget, still fall substantially behind reasoning models.<n>For reasoning models, majority voting proves to be a robust inference strategy, generally competitive or outperforming other more sophisticated ITC methods.
arXiv Detail & Related papers (2025-04-18T19:32:55Z) - Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models [54.04678363287392]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks.<n>Recent advancements in OpenAI o1 and DeepSeek-R1 have further improved performance in System-2 reasoning domains.
arXiv Detail & Related papers (2025-03-20T17:59:38Z) - Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models [33.13238566815798]
Large Language Models (LLMs) have sparked significant research interest in leveraging them to tackle complex reasoning tasks.<n>Recent studies demonstrate that encouraging LLMs to "think" with more tokens during test-time inference can significantly boost reasoning accuracy.<n>The introduction of OpenAI's o1 series marks a significant milestone in this research direction.
arXiv Detail & Related papers (2025-01-16T17:37:58Z) - CreDes: Causal Reasoning Enhancement and Dual-End Searching for Solving Long-Range Reasoning Problems using LLMs [13.977459316171013]
Large language models (LLMs) have demonstrated limitations in handling optimization problems involving long-range reasoning.
This paper introduces the Causal Relationship Enhancement (CRE) mechanism combining cause-effect interventions and the Individual Treatment Effect (ITE) to guarantee the solid causal rightness.
Experiments demonstrate that CreDes significantly outperforms existing State-Of-The-Art (SOTA) solutions in terms of both accuracy and time efficiency.
arXiv Detail & Related papers (2024-10-02T16:05:01Z) - Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training [49.3242278912771]
Multimodal reasoning is a challenging task that requires models to reason across multiple modalities to answer questions.
Existing approaches have made progress by incorporating language and visual modalities into a two-stage reasoning framework.
We propose MC-CoT, a self-consistency training strategy that generates multiple rationales and answers, subsequently selecting the most accurate through a voting process.
arXiv Detail & Related papers (2023-11-23T17:09:48Z) - Faithful Reasoning Using Large Language Models [12.132449274592668]
We show how LMs can be made to perform faithful multi-step reasoning via a process whose causal structure mirrors the underlying logical structure of the problem.
Our approach works by chaining together reasoning steps, where each step results from calls to two fine-tuned LMs.
We demonstrate the effectiveness of our model on multi-step logical deduction and scientific question-answering, showing that it outperforms baselines on final answer accuracy.
arXiv Detail & Related papers (2022-08-30T13:44:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.