SGIA: Enhancing Fine-Grained Visual Classification with Sequence Generative Image Augmentation
- URL: http://arxiv.org/abs/2412.06138v1
- Date: Mon, 09 Dec 2024 01:39:46 GMT
- Title: SGIA: Enhancing Fine-Grained Visual Classification with Sequence Generative Image Augmentation
- Authors: Qiyu Liao, Xin Yuan, Min Xu, Dadong Wang,
- Abstract summary: We introduce Sequence Generative Image Augmentation (SGIA) for augmenting Fine-Grained Visual Classification (FGVC) datasets.
Our method features a unique Bridging Transfer Learning process, designed to minimize the domain gap between real and synthetically augmented data.
Our work sets a new benchmark and outperforms the previous state-of-the-art models in classification accuracy by 0.5% for the CUB-200-2011 dataset.
- Score: 16.642582574494742
- License:
- Abstract: In Fine-Grained Visual Classification (FGVC), distinguishing highly similar subcategories remains a formidable challenge, often necessitating datasets with extensive variability. The acquisition and annotation of such FGVC datasets are notably difficult and costly, demanding specialized knowledge to identify subtle distinctions among closely related categories. Our study introduces a novel approach employing the Sequence Latent Diffusion Model (SLDM) for augmenting FGVC datasets, called Sequence Generative Image Augmentation (SGIA). Our method features a unique Bridging Transfer Learning (BTL) process, designed to minimize the domain gap between real and synthetically augmented data. This approach notably surpasses existing methods in generating more realistic image samples, providing a diverse range of pose transformations that extend beyond the traditional rigid transformations and style changes in generative augmentation. We demonstrate the effectiveness of our augmented dataset with substantial improvements in FGVC tasks on various datasets, models, and training strategies, especially in few-shot learning scenarios. Our method outperforms conventional image augmentation techniques in benchmark tests on three FGVC datasets, showcasing superior realism, variability, and representational quality. Our work sets a new benchmark and outperforms the previous state-of-the-art models in classification accuracy by 0.5% for the CUB-200-2011 dataset and advances the application of generative models in FGVC data augmentation.
Related papers
- A Simple Background Augmentation Method for Object Detection with Diffusion Model [53.32935683257045]
In computer vision, it is well-known that a lack of data diversity will impair model performance.
We propose a simple yet effective data augmentation approach by leveraging advancements in generative models.
Background augmentation, in particular, significantly improves the models' robustness and generalization capabilities.
arXiv Detail & Related papers (2024-08-01T07:40:00Z) - Advancing Fine-Grained Classification by Structure and Subject Preserving Augmentation [8.777277201807351]
We present SaSPA: Structure and Subject Preserving Augmentation.
Our method does not use real images as guidance, thereby increasing generation flexibility and promoting greater diversity.
We conduct extensive experiments and benchmark SaSPA against both traditional and recent generative data augmentation methods.
arXiv Detail & Related papers (2024-06-20T17:58:30Z) - Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - Additional Look into GAN-based Augmentation for Deep Learning COVID-19
Image Classification [57.1795052451257]
We study the dependence of the GAN-based augmentation performance on dataset size with a focus on small samples.
We train StyleGAN2-ADA with both sets and then, after validating the quality of generated images, we use trained GANs as one of the augmentations approaches in multi-class classification problems.
The GAN-based augmentation approach is found to be comparable with classical augmentation in the case of medium and large datasets but underperforms in the case of smaller datasets.
arXiv Detail & Related papers (2024-01-26T08:28:13Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z) - Unified Framework for Histopathology Image Augmentation and Classification via Generative Models [6.404713841079193]
We propose an innovative unified framework that integrates the data generation and model training stages into a unified process.
Our approach utilizes a pure Vision Transformer (ViT)-based conditional Generative Adversarial Network (cGAN) model to simultaneously handle both image synthesis and classification.
Our experiments show that our unified synthetic augmentation framework consistently enhances the performance of histopathology image classification models.
arXiv Detail & Related papers (2022-12-20T03:40:44Z) - Guiding Generative Language Models for Data Augmentation in Few-Shot
Text Classification [59.698811329287174]
We leverage GPT-2 for generating artificial training instances in order to improve classification performance.
Our results show that fine-tuning GPT-2 in a handful of label instances leads to consistent classification improvements.
arXiv Detail & Related papers (2021-11-17T12:10:03Z) - Learning Representational Invariances for Data-Efficient Action
Recognition [52.23716087656834]
We show that our data augmentation strategy leads to promising performance on the Kinetics-100, UCF-101, and HMDB-51 datasets.
We also validate our data augmentation strategy in the fully supervised setting and demonstrate improved performance.
arXiv Detail & Related papers (2021-03-30T17:59:49Z) - Domain Adaptive Transfer Learning on Visual Attention Aware Data
Augmentation for Fine-grained Visual Categorization [3.5788754401889014]
We perform domain adaptive knowledge transfer via fine-tuning on our base network model.
We show competitive improvement on accuracies by using attention-aware data augmentation techniques.
Our method achieves state-of-the-art results in multiple fine-grained classification datasets.
arXiv Detail & Related papers (2020-10-06T22:47:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.