XSpecMesh: Quality-Preserving Auto-Regressive Mesh Generation Acceleration via Multi-Head Speculative Decoding
- URL: http://arxiv.org/abs/2507.23777v2
- Date: Wed, 06 Aug 2025 09:51:03 GMT
- Title: XSpecMesh: Quality-Preserving Auto-Regressive Mesh Generation Acceleration via Multi-Head Speculative Decoding
- Authors: Dian Chen, Yansong Qu, Xinyang Li, Ming Li, Shengchuan Zhang,
- Abstract summary: XSpecMesh is a quality-preserving acceleration method for auto-regressive mesh generation models.<n>Our method achieves a 1.7x speedup without sacrificing generation quality.
- Score: 11.414780323191733
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current auto-regressive models can generate high-quality, topologically precise meshes; however, they necessitate thousands-or even tens of thousands-of next-token predictions during inference, resulting in substantial latency. We introduce XSpecMesh, a quality-preserving acceleration method for auto-regressive mesh generation models. XSpecMesh employs a lightweight, multi-head speculative decoding scheme to predict multiple tokens in parallel within a single forward pass, thereby accelerating inference. We further propose a verification and resampling strategy: the backbone model verifies each predicted token and resamples any tokens that do not meet the quality criteria. In addition, we propose a distillation strategy that trains the lightweight decoding heads by distilling from the backbone model, encouraging their prediction distributions to align and improving the success rate of speculative predictions. Extensive experiments demonstrate that our method achieves a 1.7x speedup without sacrificing generation quality. Our code will be released.
Related papers
- Learn from Your Mistakes: Self-Correcting Masked Diffusion Models [31.536464269884103]
Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models.<n>We propose a framework that trains a model to perform both unmasking and correction.<n>We name our training and sampling method Progressive Self-Correction (ProSeCo) for its unique ability to iteratively refine an entire sequence.
arXiv Detail & Related papers (2026-02-12T05:17:31Z) - Self-Rewarding Sequential Monte Carlo for Masked Diffusion Language Models [58.946955321428845]
This work presents self-rewarding sequential Monte Carlo (SMC)<n>Our algorithm stems from the observation that most existing MDLMs rely on a confidence-based sampling strategy.<n>We introduce the trajectory-level confidence as a self-rewarding signal for assigning particle importance weights.
arXiv Detail & Related papers (2026-02-02T09:21:45Z) - SJD++: Improved Speculative Jacobi Decoding for Training-free Acceleration of Discrete Auto-regressive Text-to-Image Generation [92.34355601866629]
Large autoregressive models can generate high-quality, high-resolution images but suffer from slow generation speed.<n>We propose Speculative Jacobi Decoding++ (SJD++), a training-free probabilistic parallel decoding algorithm.<n>SJD++ performs multi-token prediction in each forward pass, drastically reducing generation steps.
arXiv Detail & Related papers (2025-12-08T12:36:43Z) - FlashMesh: Faster and Better Autoregressive Mesh Synthesis via Structured Speculation [65.3277633028397]
FlashMesh is a fast and high-fidelity mesh generation framework.<n>We show that FlashMesh achieves up to a 2 x speedup over standard autoregressive models.
arXiv Detail & Related papers (2025-11-19T17:03:49Z) - Speculative Jacobi-Denoising Decoding for Accelerating Autoregressive Text-to-image Generation [110.28291466364784]
Speculative Jacobi-Denoising Decoding (SJD2) is a framework that incorporates the denoising process into Jacobi to enable parallel token generation in autoregressive models.<n>Our method introduces a next-clean-token prediction paradigm that enables the pre-trained autoregressive models to accept noise-perturbed token embeddings.
arXiv Detail & Related papers (2025-10-10T04:30:45Z) - Accelerating Diffusion LLMs via Adaptive Parallel Decoding [50.9948753314669]
We introduce adaptive parallel decoding (APD), a novel method that dynamically adjusts the number of tokens sampled in parallel.<n>APD provides markedly higher throughput with minimal quality degradations on downstream benchmarks.
arXiv Detail & Related papers (2025-05-31T06:10:10Z) - FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling [59.8051705468084]
Speculative sampling has emerged as an important technique for accelerating the auto-regressive generation process of large language models.<n>We present FR-Spec, a frequency-ranked speculative sampling framework that optimize draft candidate selection through vocabulary space compression.
arXiv Detail & Related papers (2025-02-20T18:58:10Z) - Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE [15.003006630308517]
Speculative decoding (SD) accelerates large language model inference by using a smaller draft model to predict multiple tokens.<n>We propose Jakiro, leveraging Mixture of Experts (MoE), where independent experts generate diverse predictions.<n>Our method significantly boosts prediction accuracy and achieves higher inference speedups.
arXiv Detail & Related papers (2025-02-10T09:24:06Z) - Efficient Generative Modeling with Residual Vector Quantization-Based Tokens [5.949779668853557]
ResGen is an efficient Residual Vector Quantization-based generative model for high-fidelity generation with fast sampling.<n>We validate the efficacy and generalizability of the proposed method on two challenging tasks: conditional image generation on ImageNet 256x256 and zero-shot text-to-speech synthesis.<n>As we scale the depth of RVQ, our generative models exhibit enhanced generation fidelity or faster sampling speeds compared to similarly sized baseline models.
arXiv Detail & Related papers (2024-12-13T15:31:17Z) - Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding [60.188309982690335]
We propose a training-free probabilistic parallel decoding algorithm, Speculative Jacobi Decoding (SJD)<n>SJD accelerates the inference of auto-regressive text-to-image generation while maintaining the randomness in sampling-based token decoding.<n>Specifically, SJD facilitates the model to predict multiple tokens at each step and accepts tokens based on the probabilistic criterion.
arXiv Detail & Related papers (2024-10-02T16:05:27Z) - Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines [74.42485647685272]
We focus on Generative Masked Language Models (GMLMs)
We train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model.
We adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality.
arXiv Detail & Related papers (2024-07-22T18:00:00Z) - Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion [61.03681839276652]
Diffusion Forcing is a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels.<n>We apply Diffusion Forcing to sequence generative modeling by training a causal next-token prediction model to generate one or several future tokens.
arXiv Detail & Related papers (2024-07-01T15:43:25Z) - Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass.
In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z) - Latent Autoregressive Source Separation [5.871054749661012]
This paper introduces vector-quantized Latent Autoregressive Source Separation (i.e., de-mixing an input signal into its constituent sources) without requiring additional gradient-based optimization or modifications of existing models.
Our separation method relies on the Bayesian formulation in which the autoregressive models are the priors, and a discrete (non-parametric) likelihood function is constructed by performing frequency counts over latent sums of addend tokens.
arXiv Detail & Related papers (2023-01-09T17:32:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.