TTIDA: Controllable Generative Data Augmentation via Text-to-Text and
Text-to-Image Models
- URL: http://arxiv.org/abs/2304.08821v1
- Date: Tue, 18 Apr 2023 08:40:30 GMT
- Title: TTIDA: Controllable Generative Data Augmentation via Text-to-Text and
Text-to-Image Models
- Authors: Yuwei Yin, Jean Kaddour, Xiang Zhang, Yixin Nie, Zhenguang Liu,
Lingpeng Kong, Qi Liu
- Abstract summary: We propose TTIDA (Text-to-Text-to-Image Data Augmentation) to leverage the capabilities of large-scale pre-trained Text-to-Text and Text-to-Image generative models for data augmentation.
By conditioning the T2I model on detailed descriptions produced by T2T models, we are able to generate photo-realistic labeled images in a flexible and controllable manner.
- Score: 37.2392848181456
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data augmentation has been established as an efficacious approach to
supplement useful information for low-resource datasets. Traditional
augmentation techniques such as noise injection and image transformations have
been widely used. In addition, generative data augmentation (GDA) has been
shown to produce more diverse and flexible data. While generative adversarial
networks (GANs) have been frequently used for GDA, they lack diversity and
controllability compared to text-to-image diffusion models. In this paper, we
propose TTIDA (Text-to-Text-to-Image Data Augmentation) to leverage the
capabilities of large-scale pre-trained Text-to-Text (T2T) and Text-to-Image
(T2I) generative models for data augmentation. By conditioning the T2I model on
detailed descriptions produced by T2T models, we are able to generate
photo-realistic labeled images in a flexible and controllable manner.
Experiments on in-domain classification, cross-domain classification, and image
captioning tasks show consistent improvements over other data augmentation
baselines. Analytical studies in varied settings, including few-shot,
long-tail, and adversarial, further reinforce the effectiveness of TTIDA in
enhancing performance and increasing robustness.
Related papers
- Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models [54.052963634384945]
We introduce the Image Regeneration task to assess text-to-image models.
We use GPT4V to bridge the gap between the reference image and the text input for the T2I model.
We also present ImageRepainter framework to enhance the quality of generated images.
arXiv Detail & Related papers (2024-11-14T13:52:43Z) - Erase, then Redraw: A Novel Data Augmentation Approach for Free Space Detection Using Diffusion Model [5.57325257338134]
Traditional data augmentation methods cannot alter high-level semantic attributes.
We propose a text-to-image diffusion model to parameterize image-to-image transformations.
We achieve this goal by erasing instances of real objects from the original dataset and generating new instances with similar semantics in the erased regions.
arXiv Detail & Related papers (2024-09-30T10:21:54Z) - FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation [19.65838242227773]
This paper contributes a novel, concise, and efficient approach that adapts pre-trained large-scale text-to-image (T2I) diffusion model to the image-to-image (I2I) paradigm in a plug-and-play manner.
Our method allows flexible control over both guiding factor and guiding intensity of the reference image simply by tuning the type and bandwidth of the substituted frequency band.
arXiv Detail & Related papers (2024-08-02T04:13:38Z) - Improving Text Generation on Images with Synthetic Captions [2.1175632266708733]
latent diffusion models such as SDXL and SD 1.5 have shown significant capability in generating realistic images.
We propose a low-cost approach by leveraging SDXL without any time-consuming training on large-scale datasets.
Our results demonstrate how our small scale fine-tuning approach can improve the accuracy of text generation in different scenarios.
arXiv Detail & Related papers (2024-06-01T17:27:34Z) - Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with
Auto-Generated Data [73.23388142296535]
SELMA improves the faithfulness of T2I models by fine-tuning models on automatically generated, multi-skill image-text datasets.
We show that SELMA significantly improves the semantic alignment and text faithfulness of state-of-the-art T2I diffusion models on multiple benchmarks.
We also show that fine-tuning with image-text pairs auto-collected via SELMA shows comparable performance to fine-tuning with ground truth data.
arXiv Detail & Related papers (2024-03-11T17:35:33Z) - OT-Attack: Enhancing Adversarial Transferability of Vision-Language
Models via Optimal Transport Optimization [65.57380193070574]
Vision-language pre-training models are vulnerable to multi-modal adversarial examples.
Recent works have indicated that leveraging data augmentation and image-text modal interactions can enhance the transferability of adversarial examples.
We propose an Optimal Transport-based Adversarial Attack, dubbed OT-Attack.
arXiv Detail & Related papers (2023-12-07T16:16:50Z) - RenAIssance: A Survey into AI Text-to-Image Generation in the Era of
Large Model [93.8067369210696]
Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions.
Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps.
In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models.
arXiv Detail & Related papers (2023-09-02T03:27:20Z) - Not Just Pretty Pictures: Toward Interventional Data Augmentation Using Text-to-Image Generators [12.053125079460234]
We show how modern T2I generators can be used to simulate arbitrary interventions over such environmental factors.
Our empirical findings demonstrate that modern T2I generators like Stable Diffusion can indeed be used as a powerful interventional data augmentation mechanism.
arXiv Detail & Related papers (2022-12-21T18:07:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.