Image Captions are Natural Prompts for Text-to-Image Models
- URL: http://arxiv.org/abs/2307.08526v1
- Date: Mon, 17 Jul 2023 14:38:11 GMT
- Title: Image Captions are Natural Prompts for Text-to-Image Models
- Authors: Shiye Lei, Hao Chen, Sen Zhang, Bo Zhao and Dacheng Tao
- Abstract summary: We analyze the relationship between the training effect of synthetic data and the synthetic data distribution induced by prompts.
We propose a simple yet effective method that prompts text-to-image generative models to synthesize more informative and diverse training data.
Our method significantly improves the performance of models trained on synthetic training data.
- Score: 70.30915140413383
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rapid development of Artificial Intelligence Generated Content
(AIGC), it has become common practice in many learning tasks to train or
fine-tune large models on synthetic data due to the data-scarcity and privacy
leakage problems. Albeit promising with unlimited data generation, owing to
massive and diverse information conveyed in real images, it is challenging for
text-to-image generative models to synthesize informative training data with
hand-crafted prompts, which usually leads to inferior generalization
performance when training downstream models. In this paper, we theoretically
analyze the relationship between the training effect of synthetic data and the
synthetic data distribution induced by prompts. Then we correspondingly propose
a simple yet effective method that prompts text-to-image generative models to
synthesize more informative and diverse training data. Specifically, we caption
each real image with the advanced captioning model to obtain informative and
faithful prompts that extract class-relevant information and clarify the
polysemy of class names. The image captions and class names are concatenated to
prompt generative models for training image synthesis. Extensive experiments on
ImageNette, ImageNet-100, and ImageNet-1K verify that our method significantly
improves the performance of models trained on synthetic training data, i.e.,
10% classification accuracy improvements on average.
Related papers
- Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability.
We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images.
Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z) - Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings [16.28853186016663]
We create synthetic image-text pairs for efficient and effective Visual-Language Models (VLMs) training.
Our method employs a pretrained text-to-image model to synthesize image embeddings from captions generated by an LLM.
Our VLM, finetuned on synthetic data achieves comparable performance to models trained solely on human-annotated data.
arXiv Detail & Related papers (2024-03-12T15:36:42Z) - Scaling Laws of Synthetic Images for Model Training ... for Now [54.43596959598466]
We study the scaling laws of synthetic images generated by state of the art text-to-image models.
We observe that synthetic images demonstrate a scaling trend similar to, but slightly less effective than, real images in CLIP training.
arXiv Detail & Related papers (2023-12-07T18:59:59Z) - SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with
Large Language Models [56.88192537044364]
We propose a simple-yet-effective parameter-efficient fine-tuning approach called the Semantic Understanding and Reasoning adapter (SUR-adapter) for pre-trained diffusion models.
Our approach can make text-to-image diffusion models easier to use with better user experience.
arXiv Detail & Related papers (2023-05-09T05:48:38Z) - Is synthetic data from generative models ready for image recognition? [69.42645602062024]
We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks.
We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
arXiv Detail & Related papers (2022-10-14T06:54:24Z) - Synthetic-to-Real Domain Adaptation using Contrastive Unpaired
Translation [28.19031441659854]
We propose a multi-step method to obtain training data without manual annotation effort.
From 3D object meshes, we generate images using a modern synthesis pipeline.
We utilize a state-of-the-art image-to-image translation method to adapt the synthetic images to the real domain.
arXiv Detail & Related papers (2022-03-17T17:13:23Z) - LAFITE: Towards Language-Free Training for Text-to-Image Generation [83.2935513540494]
We propose the first work to train text-to-image generation models without any text data.
Our method leverages the well-aligned multi-modal semantic space of the powerful pre-trained CLIP model.
We obtain state-of-the-art results in the standard text-to-image generation tasks.
arXiv Detail & Related papers (2021-11-27T01:54:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.