OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure
- URL: http://arxiv.org/abs/2406.17276v3
- Date: Fri, 06 Dec 2024 07:13:53 GMT
- Title: OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure
- Authors: Jikai Wang, Yi Su, Juntao Li, Qingrong Xia, Zi Ye, Xinyu Duan, Zhefeng Wang, Min Zhang,
- Abstract summary: Speculative decoding employs a "draft and then verify" mechanism to allow multiple tokens to be generated in one step.<n>Existing methods mainly adopt fixed draft structures, which fail to adapt to different situations.<n>We propose OPT-Tree, an algorithm to construct adaptive and scalable draft trees.
- Score: 40.9990864658776
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autoregressive language models demonstrate excellent performance in various scenarios. However, the inference efficiency is limited by its one-step-one-word generation mode, which has become a pressing problem recently as the models become increasingly larger. Speculative decoding employs a "draft and then verify" mechanism to allow multiple tokens to be generated in one step, realizing lossless acceleration. Existing methods mainly adopt fixed heuristic draft structures, which fail to adapt to different situations to maximize the acceptance length during verification. To alleviate this dilemma, we proposed OPT-Tree, an algorithm to construct adaptive and scalable draft trees. It searches the optimal tree structure that maximizes the mathematical expectation of the acceptance length in each decoding step. Experimental results reveal that OPT-Tree outperforms the existing draft structures and achieves a speed-up ratio of up to 3.2 compared with autoregressive decoding. If the draft model is powerful enough and the node budget is sufficient, it can generate more than ten tokens in a single step. Our code is available at https://github.com/Jikai0Wang/OPT-Tree.
Related papers
- SAGE: Accelerating Vision-Language Models via Entropy-Guided Adaptive Speculative Decoding [15.734450444255787]
Speculative decoding has emerged as a promising approach to accelerate inference in vision-language models.<n>Existing methods rely on static tree structures that remain fixed throughout the decoding process.<n>We propose SAGE, a novel framework that dynamically adjusts the speculation tree structure based on real-time prediction uncertainty.
arXiv Detail & Related papers (2026-01-31T05:35:40Z) - TALON: Confidence-Aware Speculative Decoding with Adaptive Token Trees [18.53532655905144]
Speculative decoding (SD) has become a standard technique for accelerating LLM inference without sacrificing output quality.<n>We introduce TALON, a training-free, budget-driven adaptive tree expansion framework that can be plugged into existing tree-based methods.<n> TALON consistently outperforms state-of-the-art Eagle-3, achieving up to 5.16x end-to-end speedup over auto-regressive decoding.
arXiv Detail & Related papers (2026-01-12T09:26:45Z) - Entropy-Tree: Tree-Based Decoding with Entropy-Guided Exploration [52.52685988964061]
Entropy-Tree is a tree-based decoding method that exploits entropy as a signal for branching decisions.<n>It unifies efficient structured exploration and reliable uncertainty estimation within a single decoding procedure.
arXiv Detail & Related papers (2026-01-02T07:14:05Z) - Fast Inference of Visual Autoregressive Model with Adjacency-Adaptive Dynamical Draft Trees [50.230925890958936]
We propose an adjacency-adaptive dynamic draft tree that adjusts draft tree depth and width by leveraging adjacent token states and prior acceptance rates.<n>ADT-Tree achieves speedups of 3.13xand 3.05x, respectively, and integrates seamlessly with relaxed sampling methods such as LANTERN.
arXiv Detail & Related papers (2025-12-26T04:45:49Z) - Fast Inference via Hierarchical Speculative Decoding [65.40448210801763]
We introduce Hierarchical Speculative Decoding (HSD), an algorithm that stacks draft models into a hierarchy, where each model proposes tokens, and the next larger model verifies them in a single forward pass.<n>HSD gives up to 1.2x speed-up over the best single-draft baseline.
arXiv Detail & Related papers (2025-10-22T15:56:19Z) - ProtInvTree: Deliberate Protein Inverse Folding with Reward-guided Tree Search [77.55575655986252]
ProtInvTree is a reward-guided tree-search framework for protein inverse folding.<n>It reformulates sequence generation as a deliberate, step-wise decision-making process.<n>It supports flexible test-time scaling by expanding the search depth and breadth without retraining.
arXiv Detail & Related papers (2025-06-01T09:34:20Z) - RASD: Retrieval-Augmented Speculative Decoding [5.3926068062773895]
Speculative decoding accelerates inference in large language models (LLMs)
This paper proposes RASD (Retrieval-Augmented Speculative Decoding), which adopts retrieval methods to enhance model-based speculative decoding.
arXiv Detail & Related papers (2025-03-05T12:10:14Z) - C2T: A Classifier-Based Tree Construction Method in Speculative Decoding [9.663330370149428]
Speculative decoding methods often face inefficiencies in the construction of token trees and the verification of candidate tokens.
We propose a novel method named C2T that adopts a lightweight classifier to generate and prune token trees dynamically.
arXiv Detail & Related papers (2025-02-19T11:57:02Z) - Can a Single Tree Outperform an Entire Forest? [5.448070998907116]
The prevailing mindset is that a single decision tree underperforms classic random forests in testing accuracy.
This study challenges such a mindset by significantly improving the testing accuracy of an oblique regression tree.
Our approach reformulates tree training as a differentiable unconstrained optimization task.
arXiv Detail & Related papers (2024-11-26T00:18:18Z) - Recursive Speculative Decoding: Accelerating LLM Inference via Sampling
Without Replacement [11.91629418177851]
Speculative decoding is an inference-accel method for large language models.
Recent works have advanced this method by establishing a draft-token tree.
We present Recursive Speculative Decoding (RSD), a novel tree-based method that samples draft tokens without replacement.
arXiv Detail & Related papers (2024-02-21T22:57:49Z) - Tree-Planner: Efficient Close-loop Task Planning with Large Language Models [63.06270302774049]
Tree-Planner reframes task planning with Large Language Models into three distinct phases.
Tree-Planner achieves state-of-the-art performance while maintaining high efficiency.
arXiv Detail & Related papers (2023-10-12T17:59:50Z) - A Robust Hypothesis Test for Tree Ensemble Pruning [2.4923006485141284]
We develop and present a novel theoretically justified hypothesis test of split quality for gradient boosted tree ensembles.
We show that using this method instead of the common penalty terms leads to a significant reduction in out of sample loss.
We also present several innovative extensions to the method, opening the door for a wide variety of novel tree pruning algorithms.
arXiv Detail & Related papers (2023-01-24T16:31:49Z) - Social Interpretable Tree for Pedestrian Trajectory Prediction [75.81745697967608]
We propose a tree-based method, termed as Social Interpretable Tree (SIT), to address this multi-modal prediction task.
A path in the tree from the root to leaf represents an individual possible future trajectory.
Despite the hand-crafted tree, the experimental results on ETH-UCY and Stanford Drone datasets demonstrate that our method is capable of matching or exceeding the performance of state-of-the-art methods.
arXiv Detail & Related papers (2022-05-26T12:18:44Z) - Hierarchical Shrinkage: improving the accuracy and interpretability of
tree-based methods [10.289846887751079]
We introduce Hierarchical Shrinkage (HS), a post-hoc algorithm that does not modify the tree structure.
HS substantially increases the predictive performance of decision trees, even when used in conjunction with other regularization techniques.
All code and models are released in a full-fledged package available on Github.
arXiv Detail & Related papers (2022-02-02T02:43:23Z) - Growing Deep Forests Efficiently with Soft Routing and Learned
Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects.
We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions.
Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z) - An Efficient Adversarial Attack for Tree Ensembles [91.05779257472675]
adversarial attacks on tree based ensembles such as gradient boosting decision trees (DTs) and random forests (RFs)
We show that our method can be thousands of times faster than the previous mixed-integer linear programming (MILP) based approach.
Our code is available at https://chong-z/tree-ensemble-attack.
arXiv Detail & Related papers (2020-10-22T10:59:49Z) - A Flexible Pipeline for the Optimization of CSG Trees [3.622365857213782]
CSG trees are an intuitive, yet powerful technique for the representation of geometry using a combination of Boolean set-operations and geometric primitives.
We present a systematic comparison of newly developed and existing tree optimization methods and propose a flexible processing pipeline with a focus on tree editability.
arXiv Detail & Related papers (2020-08-09T06:45:10Z) - MurTree: Optimal Classification Trees via Dynamic Programming and Search [61.817059565926336]
We present a novel algorithm for learning optimal classification trees based on dynamic programming and search.
Our approach uses only a fraction of the time required by the state-of-the-art and can handle datasets with tens of thousands of instances.
arXiv Detail & Related papers (2020-07-24T17:06:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.