Related papers: Glance: Accelerating Diffusion Models with 1 Sample

Glance: Accelerating Diffusion Models with 1 Sample

URL: http://arxiv.org/abs/2512.02899v2
Date: Thu, 11 Dec 2025 11:53:22 GMT
Title: Glance: Accelerating Diffusion Models with 1 Sample
Authors: Zhuobai Dong, Rui Zhao, Songjie Wu, Junchao Yi, Linjie Li, Zhengyuan Yang, Lijuan Wang, Alex Jinpeng Wang,
Abstract summary: Diffusion models have achieved remarkable success in image generation, yet their deployment remains constrained by the heavy computational cost.<n>Previous efforts on fewer-step distillation attempt to skip redundant steps by training compact student models.<n>We instantiate this phase-aware strategy with two experts that specialize in slow and fast denoising phases.<n>Surprisingly, instead of investing massive effort in retraining student models, we find that simply equipping the base model with lightweight LoRA adapters achieves both efficient acceleration and strong generalization.
Score: 84.0326016760497
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models have achieved remarkable success in image generation, yet their deployment remains constrained by the heavy computational cost and the need for numerous inference steps. Previous efforts on fewer-step distillation attempt to skip redundant steps by training compact student models, yet they often suffer from heavy retraining costs and degraded generalization. In this work, we take a different perspective: we accelerate smartly, not evenly, applying smaller speedups to early semantic stages and larger ones to later redundant phases. We instantiate this phase-aware strategy with two experts that specialize in slow and fast denoising phases. Surprisingly, instead of investing massive effort in retraining student models, we find that simply equipping the base model with lightweight LoRA adapters achieves both efficient acceleration and strong generalization. We refer to these two adapters as Slow-LoRA and Fast-LoRA. Through extensive experiments, our method achieves up to 5 acceleration over the base model while maintaining comparable visual quality across diverse benchmarks. Remarkably, the LoRA experts are trained with only 1 samples on a single V100 within one hour, yet the resulting models generalize strongly on unseen prompts.

Related papers

StageVAR: Stage-Aware Acceleration for Visual Autoregressive Models [69.07782637329315]
Visual Autoregressive ( VAR) modeling departs from the next-token prediction paradigm of traditional Autoregressive (AR) models through next-scale prediction.<n>Existing acceleration methods reduce runtime for large-scale steps, but rely on manual step selection and overlook the varying importance of different stages in the generation process.<n>We present Stage VAR, a systematic study and stage-aware acceleration framework for VAR models.
arXiv Detail & Related papers (2025-12-18T12:51:19Z)
Accelerating Inference of Masked Image Generators via Reinforcement Learning [41.30941040845135]
We propose Speed-RL, a novel paradigm for accelerating a pretrained MGM to generate high-quality images in fewer steps.<n>We show that the proposed method was able to accelerate the base model by a factor of 3x while maintaining comparable image quality.
arXiv Detail & Related papers (2025-11-30T21:28:00Z)
Flash-DMD: Towards High-Fidelity Few-Step Image Generation with Efficient Distillation and Joint Reinforcement Learning [32.32567390728913]
Diffusion Models have emerged as a leading class of generative models.<n>Timestep distillation is a promising technique to accelerate generation, but it often requires extensive training and leads to image quality degradation.<n>We introduce Flash-DMD, a novel framework that enables fast convergence with distillation and joint RL-based refinement.
arXiv Detail & Related papers (2025-11-25T17:47:11Z)
CHORDS: Diffusion Sampling Accelerator with Multi-core Hierarchical ODE Solvers [72.23291099555459]
Diffusion-based generative models have become dominant generators of high-fidelity images and videos but remain limited by their computationally expensive inference procedures.<n>This paper explores a general, training-free, and model-agnostic acceleration strategy via multi-core parallelism.<n>ChoRDS significantly accelerates sampling across diverse large-scale image and video diffusion models, yielding up to 2.1x speedup with four cores, improving by 50% over baselines, and 2.9x speedup with eight cores, all without quality degradation.
arXiv Detail & Related papers (2025-07-21T05:48:47Z)
Scaling Laws for Native Multimodal Models [53.490942903659565]
We revisit the architectural design of native multimodal models and conduct an extensive scaling laws study.<n>Our investigation reveals no inherent advantage to late-fusion architectures over early-fusion ones.<n>We show that incorporating Mixture of Experts (MoEs) allows models to learn modality-specific weights, significantly benefiting performance.
arXiv Detail & Related papers (2025-04-10T17:57:28Z)
CDM-QTA: Quantized Training Acceleration for Efficient LoRA Fine-Tuning of Diffusion Model [4.525120888093971]
Fine-tuning large diffusion models for custom applications demands substantial power and time.<n>We develop a novel training accelerator specifically for Low-Rank Adaptation (LoRA) of diffusion models.<n>We achieve substantial reductions in memory usage and power consumption while maintaining high model fidelity.
arXiv Detail & Related papers (2025-04-08T22:40:29Z)
A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training [53.93563224892207]
We introduce a novel speed-up method for diffusion model training, called, which is based on a closer look at time steps.<n>As a plug-and-play and architecture-agnostic approach, SpeeD consistently achieves 3-times acceleration across various diffusion architectures, datasets, and tasks.
arXiv Detail & Related papers (2024-05-27T17:51:36Z)
Preparing Lessons for Progressive Training on Language Models [75.88952808979087]
The rapid progress of Transformers in artificial intelligence has come at the cost of increased resource consumption and greenhouse gas emissions. We propose Apollo, which preptextbfares lessons for extextbfpanding textbfoperations by textbflayer functitextbfonality during training of low layers. Experiments demonstrate that Apollo achieves state-of-the-art acceleration ratios, even rivaling methods using pretrained models.
arXiv Detail & Related papers (2024-01-17T13:04:14Z)
Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping [24.547833264405355]
The proposed method achieves a 24% time reduction on average per sample and allows the pre-training to be 2.5 times faster than the baseline. While being faster, our pre-trained models are equipped with strong knowledge transferability, achieving comparable and sometimes higher GLUE score than the baseline.
arXiv Detail & Related papers (2020-10-26T06:50:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.