The Surprising Effectiveness of Skip-Tuning in Diffusion Sampling
- URL: http://arxiv.org/abs/2402.15170v1
- Date: Fri, 23 Feb 2024 08:05:23 GMT
- Title: The Surprising Effectiveness of Skip-Tuning in Diffusion Sampling
- Authors: Jiajun Ma, Shuchen Xue, Tianyang Hu, Wenjia Wang, Zhaoqiang Liu,
Zhenguo Li, Zhi-Ming Ma, Kenji Kawaguchi
- Abstract summary: Skip-Tuning is a simple yet surprisingly effective training-free tuning method on the skip connections.
Our method can achieve 100% FID improvement for pretrained EDM on ImageNet 64 with only 19 NFEs (1.75)
While Skip-Tuning increases the score-matching losses in the pixel space, the losses in the feature space are reduced.
- Score: 78.6155095947769
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the incorporation of the UNet architecture, diffusion probabilistic
models have become a dominant force in image generation tasks. One key design
in UNet is the skip connections between the encoder and decoder blocks.
Although skip connections have been shown to improve training stability and
model performance, we reveal that such shortcuts can be a limiting factor for
the complexity of the transformation. As the sampling steps decrease, the
generation process and the role of the UNet get closer to the push-forward
transformations from Gaussian distribution to the target, posing a challenge
for the network's complexity. To address this challenge, we propose
Skip-Tuning, a simple yet surprisingly effective training-free tuning method on
the skip connections. Our method can achieve 100% FID improvement for
pretrained EDM on ImageNet 64 with only 19 NFEs (1.75), breaking the limit of
ODE samplers regardless of sampling steps. Surprisingly, the improvement
persists when we increase the number of sampling steps and can even surpass the
best result from EDM-2 (1.58) with only 39 NFEs (1.57). Comprehensive
exploratory experiments are conducted to shed light on the surprising
effectiveness. We observe that while Skip-Tuning increases the score-matching
losses in the pixel space, the losses in the feature space are reduced,
particularly at intermediate noise levels, which coincide with the most
effective range accounting for image quality improvement.
Related papers
- You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs [13.133574069588896]
YOSO is a novel generative model designed for rapid, scalable, and high-fidelity one-step image synthesis with high training stability and mode coverage.
We show that our method can serve as a one-step generation model training from scratch with competitive performance.
In particular, we show that the YOSO-PixArt-$alpha$ can generate images in one step trained on 512 resolution, with the capability of adapting to 1024 resolution without extra explicit training, requiring only 10 A800 days for fine-tuning.
arXiv Detail & Related papers (2024-03-19T17:34:27Z) - Diffusion for Natural Image Matting [93.86689168212241]
We present DiffMatte, a solution designed to overcome the challenges of image matting.
First, DiffMatte decouples the decoder from the intricately coupled matting network design, involving only one lightweight decoder in the iterations of the diffusion process.
Second, we employ a self-aligned training strategy with uniform time intervals, ensuring a consistent noise sampling between training and inference across the entire time domain.
arXiv Detail & Related papers (2023-12-10T15:28:56Z) - ToddlerDiffusion: Interactive Structured Image Generation with Cascaded Schrödinger Bridge [63.00793292863]
ToddlerDiffusion is a novel approach to decomposing the complex task of RGB image generation into simpler, interpretable stages.
Our method, termed ToddlerDiffusion, cascades modality-specific models, each responsible for generating an intermediate representation.
ToddlerDiffusion consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-11-24T15:20:01Z) - SinSR: Diffusion-Based Image Super-Resolution in a Single Step [119.18813219518042]
Super-resolution (SR) methods based on diffusion models exhibit promising results.
But their practical application is hindered by the substantial number of required inference steps.
We propose a simple yet effective method for achieving single-step SR generation, named SinSR.
arXiv Detail & Related papers (2023-11-23T16:21:29Z) - Towards More Accurate Diffusion Model Acceleration with A Timestep
Aligner [84.97253871387028]
A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed.
We propose a timestep aligner that helps find a more accurate integral direction for a particular interval at the minimum cost.
Experiments show that our plug-in design can be trained efficiently and boost the inference performance of various state-of-the-art acceleration methods.
arXiv Detail & Related papers (2023-10-14T02:19:07Z) - Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion [56.38386580040991]
Consistency Trajectory Model (CTM) is a generalization of Consistency Models (CM)
CTM enables the efficient combination of adversarial training and denoising score matching loss to enhance performance.
Unlike CM, CTM's access to the score function can streamline the adoption of established controllable/conditional generation methods.
arXiv Detail & Related papers (2023-10-01T05:07:17Z) - Gradient Sparsification for Efficient Wireless Federated Learning with
Differential Privacy [25.763777765222358]
Federated learning (FL) enables distributed clients to collaboratively train a machine learning model without sharing raw data with each other.
As the model size grows, the training latency due to limited transmission bandwidth and private information degrades while using differential privacy (DP) protection.
We propose sparsification empowered FL framework wireless channels, in over to improve training efficiency without sacrificing convergence performance.
arXiv Detail & Related papers (2023-04-09T05:21:15Z) - Learning strides in convolutional neural networks [34.20666933112202]
This work introduces DiffStride, the first downsampling layer with learnable strides.
Experiments on audio and image classification show the generality and effectiveness of our solution.
arXiv Detail & Related papers (2022-02-03T16:03:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.