Interactive Data Synthesis for Systematic Vision Adaptation via
LLMs-AIGCs Collaboration
- URL: http://arxiv.org/abs/2305.12799v1
- Date: Mon, 22 May 2023 07:53:36 GMT
- Title: Interactive Data Synthesis for Systematic Vision Adaptation via
LLMs-AIGCs Collaboration
- Authors: Qifan Yu, Juncheng Li, Wentao Ye, Siliang Tang, Yueting Zhuang
- Abstract summary: We propose a new paradigm of annotated data expansion named as ChatGenImage.
The core idea behind it is to leverage the complementary strengths of diverse models to establish a highly effective and user-friendly pipeline for interactive data augmentation.
We present fascinating results obtained from our ChatGenImage framework and demonstrate the powerful potential of our synthetic data for systematic vision adaptation.
- Score: 48.54002313329872
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent text-to-image generation models have shown promising results in
generating high-fidelity photo-realistic images. In parallel, the problem of
data scarcity has brought a growing interest in employing AIGC technology for
high-quality data expansion. However, this paradigm requires well-designed
prompt engineering that cost-less data expansion and labeling remain
under-explored. Inspired by LLM's powerful capability in task guidance, we
propose a new paradigm of annotated data expansion named as ChatGenImage. The
core idea behind it is to leverage the complementary strengths of diverse
models to establish a highly effective and user-friendly pipeline for
interactive data augmentation. In this work, we extensively study how LLMs
communicate with AIGC model to achieve more controllable image generation and
make the first attempt to collaborate them for automatic data augmentation for
a variety of downstream tasks. Finally, we present fascinating results obtained
from our ChatGenImage framework and demonstrate the powerful potential of our
synthetic data for systematic vision adaptation. Our codes are available at
https://github.com/Yuqifan1117/Labal-Anything-Pipeline.
Related papers
- Image Augmentation Agent for Weakly Supervised Semantic Segmentation [19.654959889052638]
Weakly-supervised semantic segmentation (WSSS) has achieved remarkable progress using only image-level labels.
We introduce a novel approach called Image Augmentation Agent (IAA) which shows that it is possible to enhance WSSS from data generation perspective.
IAA mainly design an augmentation agent that leverages large language models (LLMs) and diffusion models to automatically generate additional images for WSSS.
arXiv Detail & Related papers (2024-12-29T11:32:55Z) - SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding [66.74446220401296]
We propose SynerGen-VL, a simple yet powerful encoder-free MLLM capable of both image understanding and generation.
We introduce the token folding mechanism and the vision-expert-based progressive alignment pretraining strategy, which effectively support high-resolution image understanding.
Our code and models shall be released.
arXiv Detail & Related papers (2024-12-12T18:59:26Z) - A Simple Background Augmentation Method for Object Detection with Diffusion Model [53.32935683257045]
In computer vision, it is well-known that a lack of data diversity will impair model performance.
We propose a simple yet effective data augmentation approach by leveraging advancements in generative models.
Background augmentation, in particular, significantly improves the models' robustness and generalization capabilities.
arXiv Detail & Related papers (2024-08-01T07:40:00Z) - Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization [62.157627519792946]
We introduce a novel framework called bridged transfer, which initially employs synthetic images for fine-tuning a pre-trained model to improve its transferability.
We propose dataset style inversion strategy to improve the stylistic alignment between synthetic and real images.
Our proposed methods are evaluated across 10 different datasets and 5 distinct models, demonstrating consistent improvements.
arXiv Detail & Related papers (2024-03-28T22:25:05Z) - A survey of synthetic data augmentation methods in computer vision [0.0]
This paper presents an extensive review of synthetic data augmentation techniques.
We focus on the important data generation and augmentation techniques, general scope of application and specific use-cases.
We provide a summary of common synthetic datasets for training computer vision models.
arXiv Detail & Related papers (2024-03-15T07:34:08Z) - Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings [16.28853186016663]
We create synthetic image-text pairs for efficient and effective Visual-Language Models (VLMs) training.
Our method employs a pretrained text-to-image model to synthesize image embeddings from captions generated by an LLM.
Our VLM, finetuned on synthetic data achieves comparable performance to models trained solely on human-annotated data.
arXiv Detail & Related papers (2024-03-12T15:36:42Z) - Can Generative Models Improve Self-Supervised Representation Learning? [0.7999703756441756]
We introduce a framework that enriches the self-supervised learning (SSL) paradigm by utilizing generative models to produce semantically consistent image augmentations.
Our results show that our framework significantly enhances the quality of learned visual representations by up to 10% Top-1 accuracy in downstream tasks.
arXiv Detail & Related papers (2024-03-09T17:17:07Z) - Improving the Effectiveness of Deep Generative Data [5.856292656853396]
Training a model on purely synthetic images for downstream image processing tasks results in an undesired performance drop compared to training on real data.
We propose a new taxonomy to describe factors contributing to this commonly observed phenomenon and investigate it on the popular CIFAR-10 dataset.
Our method outperforms baselines on downstream classification tasks both in case of training on synthetic only (Synthetic-to-Real) and training on a mix of real and synthetic data.
arXiv Detail & Related papers (2023-11-07T12:57:58Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - Is synthetic data from generative models ready for image recognition? [69.42645602062024]
We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks.
We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
arXiv Detail & Related papers (2022-10-14T06:54:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.