Related papers: SparseDM: Toward Sparse Efficient Diffusion Models

SparseDM: Toward Sparse Efficient Diffusion Models

URL: http://arxiv.org/abs/2404.10445v4
Date: Thu, 17 Apr 2025 16:05:20 GMT
Title: SparseDM: Toward Sparse Efficient Diffusion Models
Authors: Kafeng Wang, Jianfei Chen, He Li, Zhenpeng Mi, Jun Zhu,
Abstract summary: We propose a method based on the improved Straight-Through Estimator to improve the deployment efficiency of diffusion models.<n> Experimental results on a Transformer and UNet-based diffusion models demonstrate that our method reduces MACs by 50% while maintaining FID.
Score: 20.783533300147866
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models represent a powerful family of generative models widely used for image and video generation. However, the time-consuming deployment, long inference time, and requirements on large memory hinder their applications on resource constrained devices. In this paper, we propose a method based on the improved Straight-Through Estimator to improve the deployment efficiency of diffusion models. Specifically, we add sparse masks to the Convolution and Linear layers in a pre-trained diffusion model, then transfer learn the sparse model during the fine-tuning stage and turn on the sparse masks during inference. Experimental results on a Transformer and UNet-based diffusion models demonstrate that our method reduces MACs by 50% while maintaining FID. Sparse models are accelerated by approximately 1.2x on the GPU. Under other MACs conditions, the FID is also lower than 1 compared to other methods.

Related papers

Remasking Discrete Diffusion Models with Inference-Time Scaling [12.593164604625384]
We introduce the remasking diffusion model (ReMDM) sampler, a method that can be applied to pretrained masked diffusion models in a principled way. Most interestingly, ReMDM endows discrete diffusion with a form of inference-time compute scaling.
arXiv Detail & Related papers (2025-03-01T02:37:51Z)
One Diffusion Step to Real-World Super-Resolution via Flow Trajectory Distillation [60.54811860967658]
FluxSR is a novel one-step diffusion Real-ISR based on flow matching models. First, we introduce Flow Trajectory Distillation (FTD) to distill a multi-step flow matching model into a one-step Real-ISR. Second, to improve image realism and address high-frequency artifact issues in generated images, we propose TV-LPIPS as a perceptual loss.
arXiv Detail & Related papers (2025-02-04T04:11:29Z)
Adaptively Controllable Diffusion Model for Efficient Conditional Image Generation [8.857237929151795]
We propose a new adaptive framework, $textitAdaptively Controllable Diffusion (AC-Diff) Model$, to automatically and fully control the generation process. AC-Diff is expected to largely reduce the average number of generation steps and execution time while maintaining the same performance as done in the literature diffusion models.
arXiv Detail & Related papers (2024-11-19T21:26:30Z)
Energy-Based Diffusion Language Models for Text Generation [126.23425882687195]
Energy-based Diffusion Language Model (EDLM) is an energy-based model operating at the full sequence level for each diffusion step. Our framework offers a 1.3$times$ sampling speedup over existing diffusion models.
arXiv Detail & Related papers (2024-10-28T17:25:56Z)
Pruning then Reweighting: Towards Data-Efficient Training of Diffusion Models [33.09663675904689]
We investigate efficient diffusion training from the perspective of dataset pruning. Inspired by the principles of data-efficient training for generative models such as generative adversarial networks (GANs), we first extend the data selection scheme used in GANs to DM training. To further improve the generation performance, we employ a class-wise reweighting approach.
arXiv Detail & Related papers (2024-09-27T20:21:19Z)
Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications. Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space. We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z)
Simplified and Generalized Masked Diffusion for Discrete Data [47.711583631408715]
Masked (or absorbing) diffusion is actively explored as an alternative to autoregressive models for generative modeling of discrete data. In this work, we aim to provide a simple and general framework that unlocks the full potential of masked diffusion models.
arXiv Detail & Related papers (2024-06-06T17:59:10Z)
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI) In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion) Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z)
Adaptive Training Meets Progressive Scaling: Elevating Efficiency in Diffusion Models [52.1809084559048]
We propose a novel two-stage divide-and-conquer training strategy termed TDC Training. It groups timesteps based on task similarity and difficulty, assigning highly customized denoising models to each group, thereby enhancing the performance of diffusion models. While two-stage training avoids the need to train each model separately, the total training cost is even lower than training a single unified denoising model.
arXiv Detail & Related papers (2023-12-20T03:32:58Z)
Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining. We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z)
Neural Diffusion Models [2.1779479916071067]
We present a generalization of conventional diffusion models that enables defining and learning time-dependent non-linear transformations of data. NDMs outperform conventional diffusion models in terms of likelihood and produce high-quality samples.
arXiv Detail & Related papers (2023-10-12T13:54:55Z)
CamoDiffusion: Camouflaged Object Detection via Conditional Diffusion Models [72.93652777646233]
Camouflaged Object Detection (COD) is a challenging task in computer vision due to the high similarity between camouflaged objects and their surroundings. We propose a new paradigm that treats COD as a conditional mask-generation task leveraging diffusion models. Our method, dubbed CamoDiffusion, employs the denoising process of diffusion models to iteratively reduce the noise of the mask.
arXiv Detail & Related papers (2023-05-29T07:49:44Z)
Fast Inference in Denoising Diffusion Models via MMD Finetuning [23.779985842891705]
We present MMD-DDM, a novel method for fast sampling of diffusion models. Our approach is based on the idea of using the Maximum Mean Discrepancy (MMD) to finetune the learned distribution with a given budget of timesteps. Our findings show that the proposed method is able to produce high-quality samples in a fraction of the time required by widely-used diffusion models.
arXiv Detail & Related papers (2023-01-19T09:48:07Z)
Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance [95.12230117950232]
We show that a common latent space emerges from two diffusion models trained independently on related domains. Applying CycleDiffusion to text-to-image diffusion models, we show that large-scale text-to-image diffusion models can be used as zero-shot image-to-image editors.
arXiv Detail & Related papers (2022-10-11T15:53:52Z)
On Distillation of Guided Diffusion Models [94.95228078141626]
We propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from. For standard diffusion models trained on the pixelspace, our approach is able to generate images visually comparable to that of the original model. For diffusion models trained on the latent-space (e.g., Stable Diffusion), our approach is able to generate high-fidelity images using as few as 1 to 4 denoising steps.
arXiv Detail & Related papers (2022-10-06T18:03:56Z)
A Survey on Generative Diffusion Model [75.93774014861978]
Diffusion models are an emerging class of deep generative models. They have certain limitations, including a time-consuming iterative generation process and confinement to high-dimensional Euclidean space. This survey presents a plethora of advanced techniques aimed at enhancing diffusion models.
arXiv Detail & Related papers (2022-09-06T16:56:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.