Related papers: Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model

Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model

URL: http://arxiv.org/abs/2403.19600v1
Date: Thu, 28 Mar 2024 17:23:45 GMT
Title: Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model
Authors: Zhicai Wang, Longhui Wei, Tan Wang, Heyu Chen, Yanbin Hao, Xiang Wang, Xiangnan He, Qi Tian,
Abstract summary: A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models. In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques. We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
Score: 80.61157097223058
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Text-to-image (T2I) generative models have recently emerged as a powerful tool, enabling the creation of photo-realistic images and giving rise to a multitude of applications. However, the effective integration of T2I models into fundamental image classification tasks remains an open question. A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models. In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques. Our analysis reveals that these methods struggle to produce images that are both faithful (in terms of foreground objects) and diverse (in terms of background contexts) for domain-specific concepts. To tackle this challenge, we introduce an innovative inter-class data augmentation method known as Diff-Mix (https://github.com/Zhicaiwww/Diff-Mix), which enriches the dataset by performing image translations between classes. Our empirical results demonstrate that Diff-Mix achieves a better balance between faithfulness and diversity, leading to a marked improvement in performance across diverse image classification scenarios, including few-shot, conventional, and long-tail classifications for domain-specific datasets.

Related papers

Dataset Augmentation by Mixing Visual Concepts [3.5420134832331334]
This paper proposes a dataset augmentation method by fine-tuning pre-trained diffusion models. We adapt the diffusion model by conditioning it with real images and novel text embeddings. Our approach outperforms state-of-the-art augmentation techniques on benchmark classification tasks.
arXiv Detail & Related papers (2024-12-19T19:42:22Z)
GenMix: Effective Data Augmentation with Generative Diffusion Model Image Editing [37.489576508876056]
This paper introduces GenMix, a generalizable prompt-guided generative data augmentation approach. Our technique leverages image editing to generate augmented images based on custom conditional prompts. Our approach mitigates unrealistic images and label ambiguity, improving the performance and adversarial robustness of the resulting models.
arXiv Detail & Related papers (2024-12-03T10:45:34Z)
Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models [54.052963634384945]
We introduce the Image Regeneration task to assess text-to-image models. We use GPT4V to bridge the gap between the reference image and the text input for the T2I model. We also present ImageRepainter framework to enhance the quality of generated images.
arXiv Detail & Related papers (2024-11-14T13:52:43Z)
Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment [20.902935570581207]
We introduce a Multimodal Alignment and Reconstruction Network (MARNet) to enhance the model's resistance to visual noise. MARNet includes a cross-modal diffusion reconstruction module for smoothly and stably blending information across different domains. Experiments conducted on two benchmark datasets, Vireo-Food172 and Ingredient-101, demonstrate that MARNet effectively improves the quality of image information extracted by the model.
arXiv Detail & Related papers (2024-07-26T16:30:18Z)
OneDiff: A Generalist Model for Image Difference Captioning [5.71214984158106]
Image Difference Captioning (IDC) is crucial for accurately describing variations between closely related images. OneDiff is a novel generalist approach that utilizes a robust vision-language model architecture. OneDiff consistently outperforms existing state-of-the-art models in accuracy and adaptability.
arXiv Detail & Related papers (2024-07-08T06:14:37Z)
Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images. We identify model weaknesses by testing the model using the counterfactual image dataset. We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z)
DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models [18.44432223381586]
Recently, a number of image-mixing-based augmentation techniques have been introduced to improve the generalization of deep neural networks. In these techniques, two or more randomly selected natural images are mixed together to generate an augmented image. We propose DiffuseMix, a novel data augmentation technique that leverages a diffusion model to reshape training images.
arXiv Detail & Related papers (2024-04-05T05:31:02Z)
DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability [75.9781362556431]
We propose DiffDis to unify the cross-modal generative and discriminative pretraining into one single framework under the diffusion process. We show that DiffDis outperforms single-task models on both the image generation and the image-text discriminative tasks.
arXiv Detail & Related papers (2023-08-18T05:03:48Z)
Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images [60.34381768479834]
Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language. We pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-04-02T10:25:09Z)
Effective Data Augmentation With Diffusion Models [65.09758931804478]
We address the lack of diversity in data augmentation with image-to-image transformations parameterized by pre-trained text-to-image diffusion models. Our method edits images to change their semantics using an off-the-shelf diffusion model, and generalizes to novel visual concepts from a few labelled examples. We evaluate our approach on few-shot image classification tasks, and on a real-world weed recognition task, and observe an improvement in accuracy in tested domains.
arXiv Detail & Related papers (2023-02-07T20:42:28Z)
Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching. We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.