Learning Harmonized Representations for Speculative Sampling
- URL: http://arxiv.org/abs/2408.15766v3
- Date: Wed, 26 Feb 2025 11:47:58 GMT
- Title: Learning Harmonized Representations for Speculative Sampling
- Authors: Lefan Zhang, Xiaodan Wang, Yanhua Huang, Ruiwen Xu,
- Abstract summary: Speculative sampling is a promising approach to accelerate the decoding stage for Large Language Models (LLMs)<n>We propose a solution named HArmonized Speculative Sampling (HASS) that learns harmonized representations to address these issues.<n>HASS accelerates the decoding stage without adding inference overhead through harmonized objective distillation and harmonized context alignment.
- Score: 6.053109175707596
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Speculative sampling is a promising approach to accelerate the decoding stage for Large Language Models (LLMs). Recent advancements that leverage target LLM's contextual information, such as hidden states and KV cache, have shown significant practical improvements. However, these approaches suffer from inconsistent context between training and decoding. We also observe another discrepancy between the training and decoding objectives in existing speculative sampling methods. In this work, we propose a solution named HArmonized Speculative Sampling (HASS) that learns harmonized representations to address these issues. HASS accelerates the decoding stage without adding inference overhead through harmonized objective distillation and harmonized context alignment. Experiments on four LLaMA models demonstrate that HASS achieves 2.81x-4.05x wall-clock time speedup ratio averaging across three datasets, surpassing EAGLE-2 by 8%-20%. The code is available at https://github.com/HArmonizedSS/HASS.
Related papers
- CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter [9.631036588583248]
Speculative decoding is a powerful technique that accelerates Large Language Model (LLM) inference by leveraging a lightweight speculative draft model.
Recent methods have tried to solve this issue by adopting a multi-step training strategy, but the complex inputs of different training steps make it harder for the draft model to converge.
We propose CORAL, a novel framework that improves both accuracy and efficiency in speculative drafting.
arXiv Detail & Related papers (2025-02-24T06:28:26Z) - Fast-OMRA: Fast Online Motion Resolution Adaptation for Neural B-Frame Coding [5.815424522820603]
Most learned B-frame codecs with hierarchical temporal prediction suffer from the domain shift issue caused by the discrepancy in the Group-of-Pictures (GOP) size used for training and test.
One effective strategy to mitigate this domain shift issue is to downsample video frames for motion estimation.
This work introduces lightweight classifiers to determine the downsampling factor.
arXiv Detail & Related papers (2024-10-29T05:57:32Z) - Optimized Speculative Sampling for GPU Hardware Accelerators [14.681982904792763]
We optimize speculative sampling for parallel hardware accelerators to improve sampling speed.
We distribute the workload across multiple GPU threads, enabling simultaneous operations on matrix segments within thread blocks.
We conduct extensive experiments on both automatic speech recognition and summarization tasks to validate our methods.
arXiv Detail & Related papers (2024-06-16T17:19:23Z) - Combining X-Vectors and Bayesian Batch Active Learning: Two-Stage Active Learning Pipeline for Speech Recognition [0.0]
This paper introduces a novel two-stage active learning (AL) pipeline for automatic speech recognition (ASR)
The first stage utilizes unsupervised AL by using x-vectors clustering for diverse sample selection from unlabeled speech data.
The second stage incorporates a supervised AL strategy, with a batch AL method specifically developed for ASR.
arXiv Detail & Related papers (2024-05-03T19:24:41Z) - Semi-Supervised Class-Agnostic Motion Prediction with Pseudo Label
Regeneration and BEVMix [59.55173022987071]
We study the potential of semi-supervised learning for class-agnostic motion prediction.
Our framework adopts a consistency-based self-training paradigm, enabling the model to learn from unlabeled data.
Our method exhibits comparable performance to weakly and some fully supervised methods.
arXiv Detail & Related papers (2023-12-13T09:32:50Z) - Learning with Noisy Labels Using Collaborative Sample Selection and
Contrastive Semi-Supervised Learning [76.00798972439004]
Collaborative Sample Selection (CSS) removes noisy samples from identified clean set.
We introduce a co-training mechanism with a contrastive loss in semi-supervised learning.
arXiv Detail & Related papers (2023-10-24T05:37:20Z) - DiffuSeq-v2: Bridging Discrete and Continuous Text Spaces for
Accelerated Seq2Seq Diffusion Models [58.450152413700586]
We introduce a soft absorbing state that facilitates the diffusion model in learning to reconstruct discrete mutations based on the underlying Gaussian space.
We employ state-of-the-art ODE solvers within the continuous space to expedite the sampling process.
Our proposed method effectively accelerates the training convergence by 4x and generates samples of similar quality 800x faster.
arXiv Detail & Related papers (2023-10-09T15:29:10Z) - DiffusionVMR: Diffusion Model for Joint Video Moment Retrieval and
Highlight Detection [38.12212015133935]
A novel framework, DiffusionVMR, is proposed to redefine the two tasks as a unified conditional denoising generation process.
Experiments conducted on five widely-used benchmarks demonstrate the effectiveness and flexibility of the proposed DiffusionVMR.
arXiv Detail & Related papers (2023-08-29T08:20:23Z) - Hyperbolic Audio-visual Zero-shot Learning [47.66672509746274]
An analysis of the audio-visual data reveals a large degree of hyperbolicity, indicating the potential benefit of using a hyperbolic transformation to achieve curvature-aware geometric learning.
The proposed approach employs a novel loss function that incorporates cross-modality alignment between video and audio features in the hyperbolic space.
arXiv Detail & Related papers (2023-08-24T04:52:32Z) - Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly
Detectors [117.61449210940955]
We propose an efficient abnormal event detection model based on a lightweight masked auto-encoder (AE) applied at the video frame level.
We introduce an approach to weight tokens based on motion gradients, thus shifting the focus from the static background scene to the foreground objects.
We generate synthetic abnormal events to augment the training videos, and task the masked AE model to jointly reconstruct the original frames.
arXiv Detail & Related papers (2023-06-21T06:18:05Z) - Grad-PU: Arbitrary-Scale Point Cloud Upsampling via Gradient Descent
with Learned Distance Functions [77.32043242988738]
We propose a new framework for accurate point cloud upsampling that supports arbitrary upsampling rates.
Our method first interpolates the low-res point cloud according to a given upsampling rate.
arXiv Detail & Related papers (2023-04-24T06:36:35Z) - On the Theories Behind Hard Negative Sampling for Recommendation [51.64626293229085]
We offer two insightful guidelines for effective usage of Hard Negative Sampling (HNS)
We prove that employing HNS on the Personalized Ranking (BPR) learner is equivalent to optimizing One-way Partial AUC (OPAUC)
These analyses establish the theoretical foundation of HNS in optimizing Top-K recommendation performance for the first time.
arXiv Detail & Related papers (2023-02-07T13:57:03Z) - Off-Policy Reinforcement Learning with Loss Function Weighted by
Temporal Difference Error [2.255666468574186]
Training agents via off-policy deep reinforcement learning (RL) requires a large memory, named replay memory, that stores past experiences used for learning.
When calculating the loss function, off-policy algorithms assume that all samples are of the same importance.
We propose a novel method that introduces a weighting factor for each experience when calculating the loss function at the learning stage.
arXiv Detail & Related papers (2022-12-26T14:32:16Z) - SLICER: Learning universal audio representations using low-resource
self-supervised pre-training [53.06337011259031]
We present a new Self-Supervised Learning approach to pre-train encoders on unlabeled audio data.
Our primary aim is to learn audio representations that can generalize across a large variety of speech and non-speech tasks.
arXiv Detail & Related papers (2022-11-02T23:45:33Z) - Learning Quantization in LDPC Decoders [14.37550972719183]
We propose a floating-point surrogate model that imitates quantization effects as additions of uniform noise.
A deep learning-based method is then applied to optimize the message bitwidths.
We report an error-rate performance within 0.2 dB of floating-point decoding at an average message quantization bitwidth of 3.1 bits.
arXiv Detail & Related papers (2022-08-10T07:07:54Z) - Neighborhood Collective Estimation for Noisy Label Identification and
Correction [92.20697827784426]
Learning with noisy labels (LNL) aims at designing strategies to improve model performance and generalization by mitigating the effects of model overfitting to noisy labels.
Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias.
We propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors.
arXiv Detail & Related papers (2022-08-05T14:47:22Z) - DAS: Densely-Anchored Sampling for Deep Metric Learning [43.81322638018864]
We propose a Densely-Anchored Sampling (DAS) scheme that exploits the anchor's nearby embedding space to densely produce embeddings without data points.
Our method is effortlessly integrated into existing DML frameworks and improves them without bells and whistles.
arXiv Detail & Related papers (2022-07-30T02:07:46Z) - Narrowing the Gap: Improved Detector Training with Noisy Location
Annotations [32.6077497559231]
In this paper, we focus on the impact of noisy location annotations on the performance of object detection approaches.
We propose a self-correction technique based on a Bayesian filter for prediction ensemble to better exploit the noisy location annotations.
arXiv Detail & Related papers (2022-06-12T10:03:01Z) - Sound and Visual Representation Learning with Multiple Pretraining Tasks [104.11800812671953]
Self-supervised tasks (SSL) reveal different features from the data.
This work aims to combine Multiple SSL tasks (Multi-SSL) that generalizes well for all downstream tasks.
Experiments on sound representations demonstrate that Multi-SSL via incremental learning (IL) of SSL tasks outperforms single SSL task models.
arXiv Detail & Related papers (2022-01-04T09:09:38Z) - Learning from Multiple Noisy Augmented Data Sets for Better
Cross-Lingual Spoken Language Understanding [69.40915115518523]
Lack of training data presents a grand challenge to scaling out spoken language understanding (SLU) to low-resource languages.
Various data augmentation approaches have been proposed to synthesize training data in low-resource target languages.
In this paper we focus on mitigating noise in augmented data.
arXiv Detail & Related papers (2021-09-03T15:44:15Z) - Improving Calibration for Long-Tailed Recognition [68.32848696795519]
We propose two methods to improve calibration and performance in such scenarios.
For dataset bias due to different samplers, we propose shifted batch normalization.
Our proposed methods set new records on multiple popular long-tailed recognition benchmark datasets.
arXiv Detail & Related papers (2021-04-01T13:55:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.