Synthetic Data from Diffusion Models Improves ImageNet Classification
- URL: http://arxiv.org/abs/2304.08466v1
- Date: Mon, 17 Apr 2023 17:42:29 GMT
- Title: Synthetic Data from Diffusion Models Improves ImageNet Classification
- Authors: Shekoofeh Azizi, Simon Kornblith, Chitwan Saharia, Mohammad Norouzi,
David J. Fleet
- Abstract summary: Large-scale text-to image diffusion models can be fine-tuned to produce class conditional models.
Augmenting the ImageNet training set with samples from the resulting models yields significant improvements in ImageNet classification accuracy.
- Score: 47.999055841125156
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep generative models are becoming increasingly powerful, now generating
diverse high fidelity photo-realistic samples given text prompts. Have they
reached the point where models of natural images can be used for generative
data augmentation, helping to improve challenging discriminative tasks? We show
that large-scale text-to image diffusion models can be fine-tuned to produce
class conditional models with SOTA FID (1.76 at 256x256 resolution) and
Inception Score (239 at 256x256). The model also yields a new SOTA in
Classification Accuracy Scores (64.96 for 256x256 generative samples, improving
to 69.24 for 1024x1024 samples). Augmenting the ImageNet training set with
samples from the resulting models yields significant improvements in ImageNet
classification accuracy over strong ResNet and Vision Transformer baselines.
Related papers
- Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation [52.509092010267665]
We introduce LlamaGen, a new family of image generation models that apply original next-token prediction'' paradigm of large language models to visual generation domain.
It is an affirmative answer to whether vanilla autoregressive models, e.g., Llama, without inductive biases on visual signals can achieve state-of-the-art image generation performance if scaling properly.
arXiv Detail & Related papers (2024-06-10T17:59:52Z) - ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object [78.58860252442045]
We introduce generative model as a data source for hard images that benchmark deep models' robustness.
We are able to generate images with more diversified backgrounds, textures, and materials than any prior work, where we term this benchmark as ImageNet-D.
Our work suggests that diffusion models can be an effective source to test vision models.
arXiv Detail & Related papers (2024-03-27T17:23:39Z) - Consistency Models [89.68380014789861]
We propose a new family of models that generate high quality samples by directly mapping noise to data.
They support fast one-step generation by design, while still allowing multistep sampling to trade compute for sample quality.
They also support zero-shot data editing, such as image inpainting, colorization, and super-resolution, without requiring explicit training.
arXiv Detail & Related papers (2023-03-02T18:30:16Z) - Fake it till you make it: Learning transferable representations from
synthetic ImageNet clones [30.264601433216246]
We show that ImageNet clones can close a large part of the gap between models produced by synthetic images and models trained with real images.
We show that models trained on synthetic images exhibit strong generalization properties and perform on par with models trained on real data for transfer.
arXiv Detail & Related papers (2022-12-16T11:44:01Z) - Lafite2: Few-shot Text-to-Image Generation [132.14211027057766]
We propose a novel method for pre-training text-to-image generation model on image-only datasets.
It considers a retrieval-then-optimization procedure to synthesize pseudo text features.
It can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning.
arXiv Detail & Related papers (2022-10-25T16:22:23Z) - Diffusion Models Beat GANs on Image Synthesis [4.919647298882951]
We show that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models.
For conditional image synthesis, we further improve sample quality with classifier guidance.
We achieve an FID of 2.97 on ImageNet 128$times$128, 4.59 on ImageNet 256$times$256, and 7.72 on ImageNet 512$times$512, and we match BigGAN-deep even with as few as 25 forward passes per sample.
arXiv Detail & Related papers (2021-05-11T17:50:24Z) - Improved Techniques for Training Score-Based Generative Models [104.20217659157701]
We provide a new theoretical analysis of learning and sampling from score models in high dimensional spaces.
We can effortlessly scale score-based generative models to images with unprecedented resolutions.
Our score-based models can generate high-fidelity samples that rival best-in-class GANs on various image datasets.
arXiv Detail & Related papers (2020-06-16T09:17:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.