Semantic-Guided Generative Image Augmentation Method with Diffusion
Models for Image Classification
- URL: http://arxiv.org/abs/2302.02070v3
- Date: Thu, 18 Jan 2024 14:03:28 GMT
- Title: Semantic-Guided Generative Image Augmentation Method with Diffusion
Models for Image Classification
- Authors: Bohan Li, Xiao Xu, Xinghao Wang, Yutai Hou, Yunlong Feng, Feng Wang,
Xuanliang Zhang, Qingfu Zhu, Wanxiang Che
- Abstract summary: We propose SGID, a Semantic-guided Generative Image augmentation method with Diffusion models for image classification.
Specifically, SGID employs diffusion models to generate augmented images with good image diversity. More importantly, SGID takes image labels and captions as guidance to maintain semantic consistency between the augmented and original images.
- Score: 48.640470032205265
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing image augmentation methods consist of two categories:
perturbation-based methods and generative methods. Perturbation-based methods
apply pre-defined perturbations to augment an original image, but only locally
vary the image, thus lacking image diversity. In contrast, generative methods
bring more image diversity in the augmented images but may not preserve
semantic consistency, thus incorrectly changing the essential semantics of the
original image. To balance image diversity and semantic consistency in
augmented images, we propose SGID, a Semantic-guided Generative Image
augmentation method with Diffusion models for image classification.
Specifically, SGID employs diffusion models to generate augmented images with
good image diversity. More importantly, SGID takes image labels and captions as
guidance to maintain semantic consistency between the augmented and original
images. Experimental results show that SGID outperforms the best augmentation
baseline by 1.72% on ResNet-50 (from scratch), 0.33% on ViT (ImageNet-21k), and
0.14% on CLIP-ViT (LAION-2B). Moreover, SGID can be combined with other image
augmentation baselines and further improves the overall performance. We
demonstrate the semantic consistency and image diversity of SGID through
quantitative human and automated evaluations, as well as qualitative case
studies.
Related papers
- Gen-SIS: Generative Self-augmentation Improves Self-supervised Learning [52.170253590364545]
Gen-SIS is a diffusion-based augmentation technique trained exclusively on unlabeled image data.
We show that these self-augmentations', i.e. generative augmentations based on the vanilla SSL encoder embeddings, facilitate the training of a stronger SSL encoder.
arXiv Detail & Related papers (2024-12-02T16:20:59Z) - Diffusion Autoencoders for Few-shot Image Generation in Hyperbolic Space [35.516111930052105]
Few-shot image generation aims to generate diverse and high-quality images for an unseen class given only a few examples in that class.
We propose Hyperbolic Diffusion Autoencoders (HypDAE), a novel approach that operates in hyperbolic space to capture hierarchical relationships among images and texts from seen categories.
arXiv Detail & Related papers (2024-11-27T00:45:51Z) - Decoupled Data Augmentation for Improving Image Classification [37.50690945158849]
We introduce Decoupled Data Augmentation (De-DA), which resolves the fidelity-diversity dilemma.
We use generative models to modify real CDPs under controlled conditions, preserving semantic consistency.
We also replace the image's CIP with inter-class variants, creating diverse CDP-CIP combinations.
arXiv Detail & Related papers (2024-10-29T06:27:09Z) - Fusion is all you need: Face Fusion for Customized Identity-Preserving Image Synthesis [7.099258248662009]
Text-to-image (T2I) models have significantly advanced the development of artificial intelligence.
However, existing T2I-based methods often struggle to accurately reproduce the appearance of individuals from a reference image.
We leverage the pre-trained UNet from Stable Diffusion to incorporate the target face image directly into the generation process.
arXiv Detail & Related papers (2024-09-27T19:31:04Z) - DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models [18.44432223381586]
Recently, a number of image-mixing-based augmentation techniques have been introduced to improve the generalization of deep neural networks.
In these techniques, two or more randomly selected natural images are mixed together to generate an augmented image.
We propose DiffuseMix, a novel data augmentation technique that leverages a diffusion model to reshape training images.
arXiv Detail & Related papers (2024-04-05T05:31:02Z) - Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images.
Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z) - Exploring the Robustness of Human Parsers Towards Common Corruptions [99.89886010550836]
We construct three corruption robustness benchmarks, termed LIP-C, ATR-C, and Pascal-Person-Part-C, to assist us in evaluating the risk tolerance of human parsing models.
Inspired by the data augmentation strategy, we propose a novel heterogeneous augmentation-enhanced mechanism to bolster robustness under commonly corrupted conditions.
arXiv Detail & Related papers (2023-09-02T13:32:14Z) - Effective Data Augmentation With Diffusion Models [65.09758931804478]
We address the lack of diversity in data augmentation with image-to-image transformations parameterized by pre-trained text-to-image diffusion models.
Our method edits images to change their semantics using an off-the-shelf diffusion model, and generalizes to novel visual concepts from a few labelled examples.
We evaluate our approach on few-shot image classification tasks, and on a real-world weed recognition task, and observe an improvement in accuracy in tested domains.
arXiv Detail & Related papers (2023-02-07T20:42:28Z) - Siamese Image Modeling for Self-Supervised Vision Representation
Learning [73.78790119050056]
Self-supervised learning (SSL) has delivered superior performance on a variety of downstream vision tasks.
Two main-stream SSL frameworks have been proposed, i.e., Instance Discrimination (ID) and Masked Image Modeling (MIM)
We propose Siamese Image Modeling (SIM), which predicts the dense representations of an augmented view.
arXiv Detail & Related papers (2022-06-02T17:59:58Z) - DVG-Face: Dual Variational Generation for Heterogeneous Face Recognition [85.94331736287765]
We formulate HFR as a dual generation problem, and tackle it via a novel Dual Variational Generation (DVG-Face) framework.
We integrate abundant identity information of large-scale visible data into the joint distribution.
Massive new diverse paired heterogeneous images with the same identity can be generated from noises.
arXiv Detail & Related papers (2020-09-20T09:48:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.