Elucidating the Design Space of Diffusion-Based Generative Models
- URL: http://arxiv.org/abs/2206.00364v1
- Date: Wed, 1 Jun 2022 10:03:24 GMT
- Title: Elucidating the Design Space of Diffusion-Based Generative Models
- Authors: Tero Karras, Miika Aittala, Timo Aila, Samuli Laine
- Abstract summary: We present a design space that clearly separates the concrete design choices.
This lets us identify several changes to both the sampling and training processes, as well as preconditioning of the score networks.
Our improvements yield new state-of-the-art FID of 1.79 for CIFAR-10 in a class-conditional setting and 1.97 in an unconditional setting.
- Score: 37.643953493556765
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We argue that the theory and practice of diffusion-based generative models
are currently unnecessarily convoluted and seek to remedy the situation by
presenting a design space that clearly separates the concrete design choices.
This lets us identify several changes to both the sampling and training
processes, as well as preconditioning of the score networks. Together, our
improvements yield new state-of-the-art FID of 1.79 for CIFAR-10 in a
class-conditional setting and 1.97 in an unconditional setting, with much
faster sampling (35 network evaluations per image) than prior designs. To
further demonstrate their modular nature, we show that our design changes
dramatically improve both the efficiency and quality obtainable with
pre-trained score networks from previous work, including improving the FID of
an existing ImageNet-64 model from 2.07 to near-SOTA 1.55.
Related papers
- Masked Autoencoders Are Effective Tokenizers for Diffusion Models [56.08109308294133]
MAETok is an autoencoder that learns semantically rich latent space while maintaining reconstruction fidelity.
MaETok achieves significant practical improvements, enabling a gFID of 1.69 with 76x faster training and 31x higher inference throughput for 512x512 generation.
arXiv Detail & Related papers (2025-02-05T18:42:04Z) - Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models [34.15905637499148]
We propose aligning the latent space with pre-trained vision foundation models when training the visual tokenizers.
Our proposed VA-VAE significantly expands the reconstruction-generation frontier of latent diffusion models.
We build an enhanced DiT baseline with improved training strategies and architecture designs, termed LightningDiT.
arXiv Detail & Related papers (2025-01-02T18:59:40Z) - Stable Consistency Tuning: Understanding and Improving Consistency Models [40.2712218203989]
Diffusion models achieve superior generation quality but suffer from slow generation speed due to iterative nature of denoising.
consistency models, a new generative family, achieve competitive performance with significantly faster sampling.
We propose a novel framework for understanding consistency models by modeling the denoising process of the diffusion model as a Markov Decision Process (MDP) and framing consistency model training as the value estimation through Temporal Difference(TD) Learning.
arXiv Detail & Related papers (2024-10-24T17:55:52Z) - Rethinking Iterative Stereo Matching from Diffusion Bridge Model Perspective [0.0]
We propose a novel training approach that incorporates diffusion models into the iterative optimization process.
Our model ranks first in the Scene Flow dataset, achieving over a 7% improvement compared to competing methods.
arXiv Detail & Related papers (2024-04-13T17:31:11Z) - Diffusion Model for Data-Driven Black-Box Optimization [54.25693582870226]
We focus on diffusion models, a powerful generative AI technology, and investigate their potential for black-box optimization.
We study two practical types of labels: 1) noisy measurements of a real-valued reward function and 2) human preference based on pairwise comparisons.
Our proposed method reformulates the design optimization problem into a conditional sampling problem, which allows us to leverage the power of diffusion models.
arXiv Detail & Related papers (2024-03-20T00:41:12Z) - Improving Diffusion-Based Generative Models via Approximated Optimal
Transport [41.25847212384836]
We introduce the Approximated Optimal Transport technique, a novel training scheme for diffusion-based generative models.
We achieve superior image quality and reduced sampling steps by employing AOT in training.
arXiv Detail & Related papers (2024-03-08T05:43:00Z) - Systematic Architectural Design of Scale Transformed Attention Condenser
DNNs via Multi-Scale Class Representational Response Similarity Analysis [93.0013343535411]
We propose a novel type of analysis called Multi-Scale Class Representational Response Similarity Analysis (ClassRepSim)
We show that adding STAC modules to ResNet style architectures can result in up to a 1.6% increase in top-1 accuracy.
Results from ClassRepSim analysis can be used to select an effective parameterization of the STAC module resulting in competitive performance.
arXiv Detail & Related papers (2023-06-16T18:29:26Z) - ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders [104.05133094625137]
We propose a fully convolutional masked autoencoder framework and a new Global Response Normalization layer.
This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets.
arXiv Detail & Related papers (2023-01-02T18:59:31Z) - Improved Consistency Regularization for GANs [102.17007700413326]
We propose several modifications to the consistency regularization procedure designed to improve its performance.
For unconditional image synthesis on CIFAR-10 and CelebA, our modifications yield the best known FID scores on various GAN architectures.
On ImageNet-2012, we apply our technique to the original BigGAN model and improve the FID from 6.66 to 5.38, which is the best score at that model size.
arXiv Detail & Related papers (2020-02-11T22:53:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.