Interactive Data Synthesis for Systematic Vision Adaptation via
LLMs-AIGCs Collaboration
- URL: http://arxiv.org/abs/2305.12799v1
- Date: Mon, 22 May 2023 07:53:36 GMT
- Title: Interactive Data Synthesis for Systematic Vision Adaptation via
LLMs-AIGCs Collaboration
- Authors: Qifan Yu, Juncheng Li, Wentao Ye, Siliang Tang, Yueting Zhuang
- Abstract summary: We propose a new paradigm of annotated data expansion named as ChatGenImage.
The core idea behind it is to leverage the complementary strengths of diverse models to establish a highly effective and user-friendly pipeline for interactive data augmentation.
We present fascinating results obtained from our ChatGenImage framework and demonstrate the powerful potential of our synthetic data for systematic vision adaptation.
- Score: 48.54002313329872
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent text-to-image generation models have shown promising results in
generating high-fidelity photo-realistic images. In parallel, the problem of
data scarcity has brought a growing interest in employing AIGC technology for
high-quality data expansion. However, this paradigm requires well-designed
prompt engineering that cost-less data expansion and labeling remain
under-explored. Inspired by LLM's powerful capability in task guidance, we
propose a new paradigm of annotated data expansion named as ChatGenImage. The
core idea behind it is to leverage the complementary strengths of diverse
models to establish a highly effective and user-friendly pipeline for
interactive data augmentation. In this work, we extensively study how LLMs
communicate with AIGC model to achieve more controllable image generation and
make the first attempt to collaborate them for automatic data augmentation for
a variety of downstream tasks. Finally, we present fascinating results obtained
from our ChatGenImage framework and demonstrate the powerful potential of our
synthetic data for systematic vision adaptation. Our codes are available at
https://github.com/Yuqifan1117/Labal-Anything-Pipeline.
Related papers
- DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling [6.7206291284535125]
We present an effective data augmentation framework leveraging the Large Language Model (LLM) and Diffusion Model (DM)
Our approach addresses the issue of increasing the diversity of synthetic images.
Our method produces synthetic images with enhanced diversity while maintaining adherence to the target distribution.
arXiv Detail & Related papers (2024-09-25T14:02:43Z) - A Simple Background Augmentation Method for Object Detection with Diffusion Model [53.32935683257045]
In computer vision, it is well-known that a lack of data diversity will impair model performance.
We propose a simple yet effective data augmentation approach by leveraging advancements in generative models.
Background augmentation, in particular, significantly improves the models' robustness and generalization capabilities.
arXiv Detail & Related papers (2024-08-01T07:40:00Z) - SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models [39.55942000935765]
We introduce SynthVLM, a novel data synthesis pipeline for Vision Large Language Models (VLLMs)
Unlike existing methods that generate captions from images, SynthVLM employs advanced diffusion models and high-quality captions to automatically generate and select high-resolution images from captions.
We achieve state-of-the-art (SoTA) performance on various vision question answering tasks, maintaining high alignment quality and preserving advanced language abilities.
arXiv Detail & Related papers (2024-07-30T11:57:40Z) - Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability.
We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images.
Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z) - A survey of synthetic data augmentation methods in computer vision [0.0]
This paper presents an extensive review of synthetic data augmentation techniques.
We focus on the important data generation and augmentation techniques, general scope of application and specific use-cases.
We provide a summary of common synthetic datasets for training computer vision models.
arXiv Detail & Related papers (2024-03-15T07:34:08Z) - Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings [16.28853186016663]
We create synthetic image-text pairs for efficient and effective Visual-Language Models (VLMs) training.
Our method employs a pretrained text-to-image model to synthesize image embeddings from captions generated by an LLM.
Our VLM, finetuned on synthetic data achieves comparable performance to models trained solely on human-annotated data.
arXiv Detail & Related papers (2024-03-12T15:36:42Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - Improving the Effectiveness of Deep Generative Data [5.856292656853396]
Training a model on purely synthetic images for downstream image processing tasks results in an undesired performance drop compared to training on real data.
We propose a new taxonomy to describe factors contributing to this commonly observed phenomenon and investigate it on the popular CIFAR-10 dataset.
Our method outperforms baselines on downstream classification tasks both in case of training on synthetic only (Synthetic-to-Real) and training on a mix of real and synthetic data.
arXiv Detail & Related papers (2023-11-07T12:57:58Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - Is synthetic data from generative models ready for image recognition? [69.42645602062024]
We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks.
We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
arXiv Detail & Related papers (2022-10-14T06:54:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.