TTIDA: Controllable Generative Data Augmentation via Text-to-Text and
Text-to-Image Models
- URL: http://arxiv.org/abs/2304.08821v1
- Date: Tue, 18 Apr 2023 08:40:30 GMT
- Title: TTIDA: Controllable Generative Data Augmentation via Text-to-Text and
Text-to-Image Models
- Authors: Yuwei Yin, Jean Kaddour, Xiang Zhang, Yixin Nie, Zhenguang Liu,
Lingpeng Kong, Qi Liu
- Abstract summary: We propose TTIDA (Text-to-Text-to-Image Data Augmentation) to leverage the capabilities of large-scale pre-trained Text-to-Text and Text-to-Image generative models for data augmentation.
By conditioning the T2I model on detailed descriptions produced by T2T models, we are able to generate photo-realistic labeled images in a flexible and controllable manner.
- Score: 37.2392848181456
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data augmentation has been established as an efficacious approach to
supplement useful information for low-resource datasets. Traditional
augmentation techniques such as noise injection and image transformations have
been widely used. In addition, generative data augmentation (GDA) has been
shown to produce more diverse and flexible data. While generative adversarial
networks (GANs) have been frequently used for GDA, they lack diversity and
controllability compared to text-to-image diffusion models. In this paper, we
propose TTIDA (Text-to-Text-to-Image Data Augmentation) to leverage the
capabilities of large-scale pre-trained Text-to-Text (T2T) and Text-to-Image
(T2I) generative models for data augmentation. By conditioning the T2I model on
detailed descriptions produced by T2T models, we are able to generate
photo-realistic labeled images in a flexible and controllable manner.
Experiments on in-domain classification, cross-domain classification, and image
captioning tasks show consistent improvements over other data augmentation
baselines. Analytical studies in varied settings, including few-shot,
long-tail, and adversarial, further reinforce the effectiveness of TTIDA in
enhancing performance and increasing robustness.
Related papers
- Efficient Scaling of Diffusion Transformers for Text-to-Image Generation [105.7324182618969]
We study the scaling properties of various Diffusion Transformers (DiTs) for text-to-image generation by performing extensive and rigorous ablations.
We find that U-ViT, a pure self-attention based DiT model provides a simpler design and scales more effectively in comparison with cross-attention based DiT variants.
arXiv Detail & Related papers (2024-12-16T22:59:26Z) - Diffusion-Enhanced Test-time Adaptation with Text and Image Augmentation [67.37146712877794]
IT3A is a novel test-time adaptation method that utilizes a pre-trained generative model for multi-modal augmentation of each test sample from unknown new domains.
By combining augmented data from pre-trained vision and language models, we enhance the ability of the model to adapt to unknown new test data.
In a zero-shot setting, IT3A outperforms state-of-the-art test-time prompt tuning methods with a 5.50% increase in accuracy.
arXiv Detail & Related papers (2024-12-12T20:01:24Z) - Context Canvas: Enhancing Text-to-Image Diffusion Models with Knowledge Graph-Based RAG [6.701537544179892]
We introduce a novel approach to enhance the capabilities of text-to-image models by incorporating a graph-based RAG.
Our system dynamically retrieves detailed character information and relational data from the knowledge graph, enabling the generation of visually accurate and contextually rich images.
arXiv Detail & Related papers (2024-12-12T18:59:41Z) - FBSDiff: Plug-and-Play Frequency Band Substitution of Diffusion Features for Highly Controllable Text-Driven Image Translation [19.65838242227773]
This paper contributes a novel, concise, and efficient approach that adapts pre-trained large-scale text-to-image (T2I) diffusion model to the image-to-image (I2I) paradigm in a plug-and-play manner.
Our method allows flexible control over both guiding factor and guiding intensity of the reference image simply by tuning the type and bandwidth of the substituted frequency band.
arXiv Detail & Related papers (2024-08-02T04:13:38Z) - Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with
Auto-Generated Data [73.23388142296535]
SELMA improves the faithfulness of T2I models by fine-tuning models on automatically generated, multi-skill image-text datasets.
We show that SELMA significantly improves the semantic alignment and text faithfulness of state-of-the-art T2I diffusion models on multiple benchmarks.
We also show that fine-tuning with image-text pairs auto-collected via SELMA shows comparable performance to fine-tuning with ground truth data.
arXiv Detail & Related papers (2024-03-11T17:35:33Z) - RenAIssance: A Survey into AI Text-to-Image Generation in the Era of
Large Model [93.8067369210696]
Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions.
Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps.
In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models.
arXiv Detail & Related papers (2023-09-02T03:27:20Z) - Not Just Pretty Pictures: Toward Interventional Data Augmentation Using Text-to-Image Generators [12.053125079460234]
We show how modern T2I generators can be used to simulate arbitrary interventions over such environmental factors.
Our empirical findings demonstrate that modern T2I generators like Stable Diffusion can indeed be used as a powerful interventional data augmentation mechanism.
arXiv Detail & Related papers (2022-12-21T18:07:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.