Related papers: Fill-Up: Balancing Long-Tailed Data with Generative Models

Fill-Up: Balancing Long-Tailed Data with Generative Models

URL: http://arxiv.org/abs/2306.07200v1
Date: Mon, 12 Jun 2023 16:01:20 GMT
Title: Fill-Up: Balancing Long-Tailed Data with Generative Models
Authors: Joonghyuk Shin, Minguk Kang, Jaesik Park
Abstract summary: This paper proposes a new image synthesis pipeline for long-tailed situations using Textual Inversion. We show that generated images from textual-inverted text tokens effectively aligns with the real domain. We also show that real-world data imbalance scenarios can be successfully mitigated by filling up the imbalanced data with synthetic images.
Score: 11.91669614267993
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern text-to-image synthesis models have achieved an exceptional level of photorealism, generating high-quality images from arbitrary text descriptions. In light of the impressive synthesis ability, several studies have exhibited promising results in exploiting generated data for image recognition. However, directly supplementing data-hungry situations in the real-world (e.g. few-shot or long-tailed scenarios) with existing approaches result in marginal performance gains, as they suffer to thoroughly reflect the distribution of the real data. Through extensive experiments, this paper proposes a new image synthesis pipeline for long-tailed situations using Textual Inversion. The study demonstrates that generated images from textual-inverted text tokens effectively aligns with the real domain, significantly enhancing the recognition ability of a standard ResNet50 backbone. We also show that real-world data imbalance scenarios can be successfully mitigated by filling up the imbalanced data with synthetic images. In conjunction with techniques in the area of long-tailed recognition, our method achieves state-of-the-art results on standard long-tailed benchmarks when trained from scratch.

Related papers

LoFT: LoRA-fused Training Dataset Generation with Few-shot Guidance [96.6544564242316]
We introduce a novel dataset generation framework named LoFT, LoRA-Fused Training-data Generation with Few-shot Guidance.<n>Our method fine-tunes LoRA weights on individual real images and fuses them at inference time, producing synthetic images that combine the features of real images for improved diversity and fidelity of generated data.<n>Our experiments show that training on LoFT-generated data consistently outperforms other synthetic dataset methods, significantly increasing accuracy as the dataset size increases.
arXiv Detail & Related papers (2025-05-16T21:17:55Z)
RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm [34.02250139766494]
Contrastive Language-Image Pre-training (CLIP) demonstrates promising performance on a variety of benchmarks. A substantial volume of multimodal interleaved documents remains underutilized for contrastive vision-language representation learning. We establish a Real-World Data Extraction pipeline to extract high-quality images and texts. Then we design a hierarchical retrieval method to efficiently associate each image with multiple semantically relevant realistic texts. We construct RealSyn, a dataset combining realistic and synthetic texts, available in three scales: 15M, 30M, and 100M.
arXiv Detail & Related papers (2025-02-18T03:58:38Z)
Towards Realistic Data Generation for Real-World Super-Resolution [58.88039242455039]
RealDGen is an unsupervised learning data generation framework designed for real-world super-resolution. We develop content and degradation extraction strategies, which are integrated into a novel content-degradation decoupled diffusion model. Experiments demonstrate that RealDGen excels in generating large-scale, high-quality paired data that mirrors real-world degradations.
arXiv Detail & Related papers (2024-06-11T13:34:57Z)
Improving Text Generation on Images with Synthetic Captions [2.1175632266708733]
latent diffusion models such as SDXL and SD 1.5 have shown significant capability in generating realistic images. We propose a low-cost approach by leveraging SDXL without any time-consuming training on large-scale datasets. Our results demonstrate how our small scale fine-tuning approach can improve the accuracy of text generation in different scenarios.
arXiv Detail & Related papers (2024-06-01T17:27:34Z)
Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability. We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images. Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z)
Learning from Synthetic Data for Visual Grounding [55.21937116752679]
We show that SynGround can improve the localization capabilities of off-the-shelf vision-and-language models. Data generated with SynGround improves the pointing game accuracy of a pretrained ALBEF and BLIP models by 4.81% and 17.11% absolute percentage points, respectively.
arXiv Detail & Related papers (2024-03-20T17:59:43Z)
Improving the Effectiveness of Deep Generative Data [5.856292656853396]
Training a model on purely synthetic images for downstream image processing tasks results in an undesired performance drop compared to training on real data. We propose a new taxonomy to describe factors contributing to this commonly observed phenomenon and investigate it on the popular CIFAR-10 dataset. Our method outperforms baselines on downstream classification tasks both in case of training on synthetic only (Synthetic-to-Real) and training on a mix of real and synthetic data.
arXiv Detail & Related papers (2023-11-07T12:57:58Z)
Image Captions are Natural Prompts for Text-to-Image Models [70.30915140413383]
We analyze the relationship between the training effect of synthetic data and the synthetic data distribution induced by prompts. We propose a simple yet effective method that prompts text-to-image generative models to synthesize more informative and diverse training data. Our method significantly improves the performance of models trained on synthetic training data.
arXiv Detail & Related papers (2023-07-17T14:38:11Z)
Image Captioning with Multi-Context Synthetic Data [16.961112970612447]
Large models have excelled in producing high-quality images and text. We present an innovative pipeline that introduces multi-context data generation. Our model is exclusively trained on synthetic image-text pairs crafted through this process.
arXiv Detail & Related papers (2023-05-29T13:18:59Z)
Generalizable Synthetic Image Detection via Language-guided Contrastive Learning [22.533225521726116]
malevolent use of synthetic images, such as the dissemination of fake news or the creation of fake profiles, raises significant concerns regarding the authenticity of images. We propose a simple yet very effective synthetic image detection method via a language-guided contrastive learning. It is shown that our proposed LanguAge-guided SynThEsis Detection (LASTED) model achieves much improved generalizability to unseen image generation models.
arXiv Detail & Related papers (2023-05-23T08:13:27Z)
Is synthetic data from generative models ready for image recognition? [69.42645602062024]
We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks. We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
arXiv Detail & Related papers (2022-10-14T06:54:24Z)
Re-Imagen: Retrieval-Augmented Text-to-Image Generator [58.60472701831404]
Retrieval-Augmented Text-to-Image Generator (Re-Imagen) Retrieval-Augmented Text-to-Image Generator (Re-Imagen)
arXiv Detail & Related papers (2022-09-29T00:57:28Z)
Syn2Real Transfer Learning for Image Deraining using Gaussian Processes [92.15895515035795]
CNN-based methods for image deraining have achieved excellent performance in terms of reconstruction error as well as visual quality. Due to challenges in obtaining real world fully-labeled image deraining datasets, existing methods are trained only on synthetically generated data. We propose a Gaussian Process-based semi-supervised learning framework which enables the network in learning to derain using synthetic dataset.
arXiv Detail & Related papers (2020-06-10T00:33:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.