Fill-Up: Balancing Long-Tailed Data with Generative Models
- URL: http://arxiv.org/abs/2306.07200v1
- Date: Mon, 12 Jun 2023 16:01:20 GMT
- Title: Fill-Up: Balancing Long-Tailed Data with Generative Models
- Authors: Joonghyuk Shin, Minguk Kang, Jaesik Park
- Abstract summary: This paper proposes a new image synthesis pipeline for long-tailed situations using Textual Inversion.
We show that generated images from textual-inverted text tokens effectively aligns with the real domain.
We also show that real-world data imbalance scenarios can be successfully mitigated by filling up the imbalanced data with synthetic images.
- Score: 11.91669614267993
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern text-to-image synthesis models have achieved an exceptional level of
photorealism, generating high-quality images from arbitrary text descriptions.
In light of the impressive synthesis ability, several studies have exhibited
promising results in exploiting generated data for image recognition. However,
directly supplementing data-hungry situations in the real-world (e.g. few-shot
or long-tailed scenarios) with existing approaches result in marginal
performance gains, as they suffer to thoroughly reflect the distribution of the
real data. Through extensive experiments, this paper proposes a new image
synthesis pipeline for long-tailed situations using Textual Inversion. The
study demonstrates that generated images from textual-inverted text tokens
effectively aligns with the real domain, significantly enhancing the
recognition ability of a standard ResNet50 backbone. We also show that
real-world data imbalance scenarios can be successfully mitigated by filling up
the imbalanced data with synthetic images. In conjunction with techniques in
the area of long-tailed recognition, our method achieves state-of-the-art
results on standard long-tailed benchmarks when trained from scratch.
Related papers
- RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm [34.02250139766494]
A substantial volume of non-paired data, such as multimodal interleaved documents, remains underutilized for vision-language representation learning.
We establish a Real-World Data Extraction pipeline to extract high-quality images and texts.
Then we design a hierarchical retrieval method to efficiently associate each image with multiple semantically relevant realistic texts.
We construct RealSyn, a dataset combining realistic and synthetic texts, available in three scales.
arXiv Detail & Related papers (2025-02-18T03:58:38Z) - Augmented Conditioning Is Enough For Effective Training Image Generation [11.60839452103417]
We find that conditioning the generation process on an augmented real image and text prompt produces generations that serve as effective synthetic datasets for downstream training.
We validate augmentation-conditioning on a total of five established long-tail and few-shot image classification benchmarks.
arXiv Detail & Related papers (2025-02-06T19:57:33Z) - Improving Text Generation on Images with Synthetic Captions [2.1175632266708733]
latent diffusion models such as SDXL and SD 1.5 have shown significant capability in generating realistic images.
We propose a low-cost approach by leveraging SDXL without any time-consuming training on large-scale datasets.
Our results demonstrate how our small scale fine-tuning approach can improve the accuracy of text generation in different scenarios.
arXiv Detail & Related papers (2024-06-01T17:27:34Z) - Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability.
We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images.
Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z) - Learning from Synthetic Data for Visual Grounding [55.21937116752679]
We show that SynGround can improve the localization capabilities of off-the-shelf vision-and-language models.
Data generated with SynGround improves the pointing game accuracy of a pretrained ALBEF and BLIP models by 4.81% and 17.11% absolute percentage points, respectively.
arXiv Detail & Related papers (2024-03-20T17:59:43Z) - Improving the Effectiveness of Deep Generative Data [5.856292656853396]
Training a model on purely synthetic images for downstream image processing tasks results in an undesired performance drop compared to training on real data.
We propose a new taxonomy to describe factors contributing to this commonly observed phenomenon and investigate it on the popular CIFAR-10 dataset.
Our method outperforms baselines on downstream classification tasks both in case of training on synthetic only (Synthetic-to-Real) and training on a mix of real and synthetic data.
arXiv Detail & Related papers (2023-11-07T12:57:58Z) - Image Captions are Natural Prompts for Text-to-Image Models [70.30915140413383]
We analyze the relationship between the training effect of synthetic data and the synthetic data distribution induced by prompts.
We propose a simple yet effective method that prompts text-to-image generative models to synthesize more informative and diverse training data.
Our method significantly improves the performance of models trained on synthetic training data.
arXiv Detail & Related papers (2023-07-17T14:38:11Z) - Is synthetic data from generative models ready for image recognition? [69.42645602062024]
We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks.
We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
arXiv Detail & Related papers (2022-10-14T06:54:24Z) - Re-Imagen: Retrieval-Augmented Text-to-Image Generator [58.60472701831404]
Retrieval-Augmented Text-to-Image Generator (Re-Imagen)
Retrieval-Augmented Text-to-Image Generator (Re-Imagen)
arXiv Detail & Related papers (2022-09-29T00:57:28Z) - Syn2Real Transfer Learning for Image Deraining using Gaussian Processes [92.15895515035795]
CNN-based methods for image deraining have achieved excellent performance in terms of reconstruction error as well as visual quality.
Due to challenges in obtaining real world fully-labeled image deraining datasets, existing methods are trained only on synthetically generated data.
We propose a Gaussian Process-based semi-supervised learning framework which enables the network in learning to derain using synthetic dataset.
arXiv Detail & Related papers (2020-06-10T00:33:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.