Related papers: Elucidating the Design Space of Diffusion-Based Generative Models

Elucidating the Design Space of Diffusion-Based Generative Models

URL: http://arxiv.org/abs/2206.00364v1
Date: Wed, 1 Jun 2022 10:03:24 GMT
Title: Elucidating the Design Space of Diffusion-Based Generative Models
Authors: Tero Karras, Miika Aittala, Timo Aila, Samuli Laine
Abstract summary: We present a design space that clearly separates the concrete design choices. This lets us identify several changes to both the sampling and training processes, as well as preconditioning of the score networks. Our improvements yield new state-of-the-art FID of 1.79 for CIFAR-10 in a class-conditional setting and 1.97 in an unconditional setting.
Score: 37.643953493556765
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices. This lets us identify several changes to both the sampling and training processes, as well as preconditioning of the score networks. Together, our improvements yield new state-of-the-art FID of 1.79 for CIFAR-10 in a class-conditional setting and 1.97 in an unconditional setting, with much faster sampling (35 network evaluations per image) than prior designs. To further demonstrate their modular nature, we show that our design changes dramatically improve both the efficiency and quality obtainable with pre-trained score networks from previous work, including improving the FID of an existing ImageNet-64 model from 2.07 to near-SOTA 1.55.

Related papers

Variational Supervised Contrastive Learning [50.79938854370321]
We propose Variational Supervised Contrastive Learning (VarCon), which reformulates supervised contrastive learning as variational inference over latent class variables.<n>VarCon achieves state-of-the-art performance for contrastive learning frameworks, reaching 79.36% Top-1 accuracy on ImageNet-1K and 78.29% on CIFAR-100 with a ResNet-50 encoder.
arXiv Detail & Related papers (2025-06-09T04:19:12Z)
Inductive Moment Matching [80.96561758341664]
We propose Inductive Moment Matching (IMM), a new class of generative models for one- or few-step sampling with a single-stage training procedure. IMM surpasses diffusion models on ImageNet-256x256 with 1.99 FID using only 8 inference steps and achieves state-of-the-art 2-step FID of 1.98 on CIFAR-10 for a model trained from scratch.
arXiv Detail & Related papers (2025-03-10T17:37:39Z)
Masked Autoencoders Are Effective Tokenizers for Diffusion Models [56.08109308294133]
MAETok is an autoencoder that learns semantically rich latent space while maintaining reconstruction fidelity. MaETok achieves significant practical improvements, enabling a gFID of 1.69 with 76x faster training and 31x higher inference throughput for 512x512 generation.
arXiv Detail & Related papers (2025-02-05T18:42:04Z)
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models [33.519892081718716]
We propose aligning the latent space with pre-trained vision foundation models when training the visual tokenizers. Our proposed VA-VAE significantly expands the reconstruction-generation frontier of latent diffusion models. We build an enhanced DiT baseline with improved training strategies and architecture designs, termed LightningDiT.
arXiv Detail & Related papers (2025-01-02T18:59:40Z)
Stable Consistency Tuning: Understanding and Improving Consistency Models [40.2712218203989]
Diffusion models achieve superior generation quality but suffer from slow generation speed due to iterative nature of denoising. consistency models, a new generative family, achieve competitive performance with significantly faster sampling. We propose a novel framework for understanding consistency models by modeling the denoising process of the diffusion model as a Markov Decision Process (MDP) and framing consistency model training as the value estimation through Temporal Difference(TD) Learning.
arXiv Detail & Related papers (2024-10-24T17:55:52Z)
Rethinking Iterative Stereo Matching from Diffusion Bridge Model Perspective [0.0]
We propose a novel training approach that incorporates diffusion models into the iterative optimization process. Our model ranks first in the Scene Flow dataset, achieving over a 7% improvement compared to competing methods.
arXiv Detail & Related papers (2024-04-13T17:31:11Z)
Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs. We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z)
Diffusion Model for Data-Driven Black-Box Optimization [54.25693582870226]
We focus on diffusion models, a powerful generative AI technology, and investigate their potential for black-box optimization. We study two practical types of labels: 1) noisy measurements of a real-valued reward function and 2) human preference based on pairwise comparisons. Our proposed method reformulates the design optimization problem into a conditional sampling problem, which allows us to leverage the power of diffusion models.
arXiv Detail & Related papers (2024-03-20T00:41:12Z)
Improving Diffusion-Based Generative Models via Approximated Optimal Transport [41.25847212384836]
We introduce the Approximated Optimal Transport technique, a novel training scheme for diffusion-based generative models. We achieve superior image quality and reduced sampling steps by employing AOT in training.
arXiv Detail & Related papers (2024-03-08T05:43:00Z)
Analyzing and Improving the Training Dynamics of Diffusion Models [36.37845647984578]
We identify and rectify several causes for uneven and ineffective training in the popular ADM diffusion model architecture. We find that systematic application of this philosophy eliminates the observed drifts and imbalances, resulting in considerably better networks at equal computational complexity.
arXiv Detail & Related papers (2023-12-05T11:55:47Z)
Systematic Architectural Design of Scale Transformed Attention Condenser DNNs via Multi-Scale Class Representational Response Similarity Analysis [93.0013343535411]
We propose a novel type of analysis called Multi-Scale Class Representational Response Similarity Analysis (ClassRepSim) We show that adding STAC modules to ResNet style architectures can result in up to a 1.6% increase in top-1 accuracy. Results from ClassRepSim analysis can be used to select an effective parameterization of the STAC module resulting in competitive performance.
arXiv Detail & Related papers (2023-06-16T18:29:26Z)
ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders [104.05133094625137]
We propose a fully convolutional masked autoencoder framework and a new Global Response Normalization layer. This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets.
arXiv Detail & Related papers (2023-01-02T18:59:31Z)
Improved Consistency Regularization for GANs [102.17007700413326]
We propose several modifications to the consistency regularization procedure designed to improve its performance. For unconditional image synthesis on CIFAR-10 and CelebA, our modifications yield the best known FID scores on various GAN architectures. On ImageNet-2012, we apply our technique to the original BigGAN model and improve the FID from 6.66 to 5.38, which is the best score at that model size.
arXiv Detail & Related papers (2020-02-11T22:53:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.