C2T: A Classifier-Based Tree Construction Method in Speculative Decoding
- URL: http://arxiv.org/abs/2502.13652v1
- Date: Wed, 19 Feb 2025 11:57:02 GMT
- Title: C2T: A Classifier-Based Tree Construction Method in Speculative Decoding
- Authors: Feiye Huo, Jianchao Tan, Kefeng Zhang, Xunliang Cai, Shengli Sun,
- Abstract summary: Speculative decoding methods often face inefficiencies in the construction of token trees and the verification of candidate tokens.<n>We propose a novel method named C2T that adopts a lightweight classifier to generate and prune token trees dynamically.
- Score: 9.663330370149428
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The growing scale of Large Language Models (LLMs) has exacerbated inference latency and computational costs. Speculative decoding methods, which aim to mitigate these issues, often face inefficiencies in the construction of token trees and the verification of candidate tokens. Existing strategies, including chain mode, static tree, and dynamic tree approaches, have limitations in accurately preparing candidate token trees for verification. We propose a novel method named C2T that adopts a lightweight classifier to generate and prune token trees dynamically. Our classifier considers additional feature variables beyond the commonly used joint probability to predict the confidence score for each draft token to determine whether it is the candidate token for verification. This method outperforms state-of-the-art (SOTA) methods such as EAGLE-2 on multiple benchmarks, by reducing the total number of candidate tokens by 25% while maintaining or even improving the acceptance length.
Related papers
- Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling [90.86991492288487]
evaluating constraint on every token can be prohibitively expensive.
LCD can distort the global distribution over strings, sampling tokens based only on local information.
We show that our approach is superior to state-of-the-art baselines.
arXiv Detail & Related papers (2025-04-07T18:30:18Z) - RASD: Retrieval-Augmented Speculative Decoding [5.3926068062773895]
Speculative decoding accelerates inference in large language models (LLMs)
This paper proposes RASD (Retrieval-Augmented Speculative Decoding), which adopts retrieval methods to enhance model-based speculative decoding.
arXiv Detail & Related papers (2025-03-05T12:10:14Z) - Order-agnostic Identifier for Large Language Model-based Generative Recommendation [94.37662915542603]
Items are assigned identifiers for Large Language Models (LLMs) to encode user history and generate the next item.
Existing approaches leverage either token-sequence identifiers, representing items as discrete token sequences, or single-token identifiers, using ID or semantic embeddings.
We propose SETRec, which leverages semantic tokenizers to obtain order-agnostic multi-dimensional tokens.
arXiv Detail & Related papers (2025-02-15T15:25:38Z) - Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling [53.58854856174773]
Speculative decoding is an approach to accelerate inference through a guess-and-verify paradigm.
Token Recycling stores candidate tokens in an adjacency matrix and employs a breadth-first search algorithm.
It significantly outperforms existing train-free methods by 30% and even a training method by 25%.
arXiv Detail & Related papers (2024-08-16T12:20:56Z) - OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure [40.9990864658776]
Speculative decoding employs a "draft and then verify" mechanism to allow multiple tokens to be generated in one step.<n>Existing methods mainly adopt fixed draft structures, which fail to adapt to different situations.<n>We propose OPT-Tree, an algorithm to construct adaptive and scalable draft trees.
arXiv Detail & Related papers (2024-06-25T04:45:53Z) - Recursive Speculative Decoding: Accelerating LLM Inference via Sampling
Without Replacement [11.91629418177851]
Speculative decoding is an inference-accel method for large language models.
Recent works have advanced this method by establishing a draft-token tree.
We present Recursive Speculative Decoding (RSD), a novel tree-based method that samples draft tokens without replacement.
arXiv Detail & Related papers (2024-02-21T22:57:49Z) - Tree-Planner: Efficient Close-loop Task Planning with Large Language Models [63.06270302774049]
Tree-Planner reframes task planning with Large Language Models into three distinct phases.
Tree-Planner achieves state-of-the-art performance while maintaining high efficiency.
arXiv Detail & Related papers (2023-10-12T17:59:50Z) - Tree Cross Attention [59.8891512435847]
Tree Cross Attention (TCA) is a module based on Cross Attention that only retrieves information from a logarithmic $mathcalO(log(N))$ number of tokens for performing inference.
We show that TCA performs comparable to Cross Attention across various classification and uncertainty regression tasks while being significantly more token-efficient.
arXiv Detail & Related papers (2023-09-29T16:50:23Z) - Optimal randomized classification trees [0.0]
Classification and Regression Trees (CARTs) are off-the-shelf techniques in modern Statistics and Machine Learning.
CARTs are built by means of a greedy procedure, sequentially deciding the splitting predictor variable(s) and the associated threshold.
This greedy approach trains trees very fast, but, by its nature, their classification accuracy may not be competitive against other state-of-the-art procedures.
arXiv Detail & Related papers (2021-10-19T11:41:12Z) - Complex Event Forecasting with Prediction Suffix Trees: Extended
Technical Report [70.7321040534471]
Complex Event Recognition (CER) systems have become popular in the past two decades due to their ability to "instantly" detect patterns on real-time streams of events.
There is a lack of methods for forecasting when a pattern might occur before such an occurrence is actually detected by a CER engine.
We present a formal framework that attempts to address the issue of Complex Event Forecasting.
arXiv Detail & Related papers (2021-09-01T09:52:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.