Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with
Synthetic Images
- URL: http://arxiv.org/abs/2312.02253v1
- Date: Mon, 4 Dec 2023 18:35:27 GMT
- Title: Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with
Synthetic Images
- Authors: Zhuoran Yu, Chenchen Zhu, Sean Culatana, Raghuraman Krishnamoorthi,
Fanyi Xiao and Yong Jae Lee
- Abstract summary: We present a new framework leveraging off-the-shelf generative models to generate synthetic training images.
We address class name ambiguity, lack of diversity in naive prompts, and domain shifts.
Our framework consistently enhances recognition model performance with more synthetic data.
- Score: 37.29348016920314
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in generative deep learning have enabled the creation of
high-quality synthetic images in text-to-image generation. Prior work shows
that fine-tuning a pretrained diffusion model on ImageNet and generating
synthetic training images from the finetuned model can enhance an ImageNet
classifier's performance. However, performance degrades as synthetic images
outnumber real ones. In this paper, we explore whether generative fine-tuning
is essential for this improvement and whether it is possible to further scale
up training using more synthetic data. We present a new framework leveraging
off-the-shelf generative models to generate synthetic training images,
addressing multiple challenges: class name ambiguity, lack of diversity in
naive prompts, and domain shifts. Specifically, we leverage large language
models (LLMs) and CLIP to resolve class name ambiguity. To diversify images, we
propose contextualized diversification (CD) and stylized diversification (SD)
methods, also prompted by LLMs. Finally, to mitigate domain shifts, we leverage
domain adaptation techniques with auxiliary batch normalization for synthetic
images. Our framework consistently enhances recognition model performance with
more synthetic data, up to 6x of original ImageNet size showcasing the
potential of synthetic data for improved recognition models and strong
out-of-domain generalization.
Related papers
- Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability.
We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images.
Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z) - Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings [16.28853186016663]
We create synthetic image-text pairs for efficient and effective Visual-Language Models (VLMs) training.
Our method employs a pretrained text-to-image model to synthesize image embeddings from captions generated by an LLM.
Our VLM, finetuned on synthetic data achieves comparable performance to models trained solely on human-annotated data.
arXiv Detail & Related papers (2024-03-12T15:36:42Z) - Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images.
Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z) - Diversified in-domain synthesis with efficient fine-tuning for few-shot
classification [64.86872227580866]
Few-shot image classification aims to learn an image classifier using only a small set of labeled examples per class.
We propose DISEF, a novel approach which addresses the generalization challenge in few-shot learning using synthetic data.
We validate our method in ten different benchmarks, consistently outperforming baselines and establishing a new state-of-the-art for few-shot classification.
arXiv Detail & Related papers (2023-12-05T17:18:09Z) - Image Captions are Natural Prompts for Text-to-Image Models [70.30915140413383]
We analyze the relationship between the training effect of synthetic data and the synthetic data distribution induced by prompts.
We propose a simple yet effective method that prompts text-to-image generative models to synthesize more informative and diverse training data.
Our method significantly improves the performance of models trained on synthetic training data.
arXiv Detail & Related papers (2023-07-17T14:38:11Z) - Is synthetic data from generative models ready for image recognition? [69.42645602062024]
We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks.
We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
arXiv Detail & Related papers (2022-10-14T06:54:24Z) - Semantic Image Synthesis via Diffusion Models [159.4285444680301]
Denoising Diffusion Probabilistic Models (DDPMs) have achieved remarkable success in various image generation tasks.
Recent work on semantic image synthesis mainly follows the emphde facto Generative Adversarial Nets (GANs)
arXiv Detail & Related papers (2022-06-30T18:31:51Z) - DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder [73.1010640692609]
We propose a VQ-VAE architecture model with a diffusion decoder (DiVAE) to work as the reconstructing component in image synthesis.
Our model achieves state-of-the-art results and generates more photorealistic images specifically.
arXiv Detail & Related papers (2022-06-01T10:39:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.