Improved Techniques for Training Score-Based Generative Models
- URL: http://arxiv.org/abs/2006.09011v2
- Date: Fri, 23 Oct 2020 19:37:51 GMT
- Title: Improved Techniques for Training Score-Based Generative Models
- Authors: Yang Song and Stefano Ermon
- Abstract summary: We provide a new theoretical analysis of learning and sampling from score models in high dimensional spaces.
We can effortlessly scale score-based generative models to images with unprecedented resolutions.
Our score-based models can generate high-fidelity samples that rival best-in-class GANs on various image datasets.
- Score: 104.20217659157701
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Score-based generative models can produce high quality image samples
comparable to GANs, without requiring adversarial optimization. However,
existing training procedures are limited to images of low resolution (typically
below 32x32), and can be unstable under some settings. We provide a new
theoretical analysis of learning and sampling from score models in high
dimensional spaces, explaining existing failure modes and motivating new
solutions that generalize across datasets. To enhance stability, we also
propose to maintain an exponential moving average of model weights. With these
improvements, we can effortlessly scale score-based generative models to images
with unprecedented resolutions ranging from 64x64 to 256x256. Our score-based
models can generate high-fidelity samples that rival best-in-class GANs on
various image datasets, including CelebA, FFHQ, and multiple LSUN categories.
Related papers
- Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis [62.06970466554273]
We present Meissonic, which non-autoregressive masked image modeling (MIM) text-to-image elevates to a level comparable with state-of-the-art diffusion models like SDXL.
We leverage high-quality training data, integrate micro-conditions informed by human preference scores, and employ feature compression layers to further enhance image fidelity and resolution.
Our model not only matches but often exceeds the performance of existing models like SDXL in generating high-quality, high-resolution images.
arXiv Detail & Related papers (2024-10-10T17:59:17Z) - Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation [52.509092010267665]
We introduce LlamaGen, a new family of image generation models that apply original next-token prediction'' paradigm of large language models to visual generation domain.
It is an affirmative answer to whether vanilla autoregressive models, e.g., Llama, without inductive biases on visual signals can achieve state-of-the-art image generation performance if scaling properly.
arXiv Detail & Related papers (2024-06-10T17:59:52Z) - WDM: 3D Wavelet Diffusion Models for High-Resolution Medical Image Synthesis [1.647759094903376]
This work presents WDM, a wavelet-based medical image synthesis framework that applies a diffusion model on wavelet images.
Experimental results on BraTS and LIDC-IDRI unconditional image generation at a resolution of $128 times 128 times 128$ demonstrate state-of-the-art image fidelity (FID) and sample diversity (MS-SSIM) scores.
Our proposed method is the only one capable of generating high-quality images at a resolution of $256 times 256 times 256$, outperforming all comparing methods.
arXiv Detail & Related papers (2024-02-29T11:11:05Z) - Large-scale Reinforcement Learning for Diffusion Models [30.164571425479824]
Text-to-image diffusion models are susceptible to implicit biases that arise from web-scale text-image training pairs.
We present an effective scalable algorithm to improve diffusion models using Reinforcement Learning (RL)
We show how our approach substantially outperforms existing methods for aligning diffusion models with human preferences.
arXiv Detail & Related papers (2024-01-20T08:10:43Z) - Enhancing Diffusion Models with 3D Perspective Geometry Constraints [10.21800236402905]
We introduce a novel geometric constraint in the training process of generative models to enforce perspective accuracy.
We show that outputs of models trained with this constraint both appear more realistic and improve performance of downstream models trained on generated images.
arXiv Detail & Related papers (2023-12-01T21:56:43Z) - Benchmark Generation Framework with Customizable Distortions for Image
Classifier Robustness [4.339574774938128]
We present a novel framework for generating adversarial benchmarks to evaluate the robustness of image classification models.
Our framework allows users to customize the types of distortions to be optimally applied to images, which helps address the specific distortions relevant to their deployment.
arXiv Detail & Related papers (2023-10-28T07:40:42Z) - ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with
Diffusion Models [126.35334860896373]
We investigate the capability of generating images from pre-trained diffusion models at much higher resolutions than the training image sizes.
Existing works for higher-resolution generation, such as attention-based and joint-diffusion approaches, cannot well address these issues.
We propose a simple yet effective re-dilation that can dynamically adjust the convolutional perception field during inference.
arXiv Detail & Related papers (2023-10-11T17:52:39Z) - Conditional Generation from Unconditional Diffusion Models using
Denoiser Representations [94.04631421741986]
We propose adapting pre-trained unconditional diffusion models to new conditions using the learned internal representations of the denoiser network.
We show that augmenting the Tiny ImageNet training set with synthetic images generated by our approach improves the classification accuracy of ResNet baselines by up to 8%.
arXiv Detail & Related papers (2023-06-02T20:09:57Z) - Consistency Models [89.68380014789861]
We propose a new family of models that generate high quality samples by directly mapping noise to data.
They support fast one-step generation by design, while still allowing multistep sampling to trade compute for sample quality.
They also support zero-shot data editing, such as image inpainting, colorization, and super-resolution, without requiring explicit training.
arXiv Detail & Related papers (2023-03-02T18:30:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.