SGD-Mix: Enhancing Domain-Specific Image Classification with Label-Preserving Data Augmentation
- URL: http://arxiv.org/abs/2505.11813v1
- Date: Sat, 17 May 2025 03:51:18 GMT
- Title: SGD-Mix: Enhancing Domain-Specific Image Classification with Label-Preserving Data Augmentation
- Authors: Yixuan Dong, Fang-Yi Su, Jung-Hsien Chiang,
- Abstract summary: We propose a novel framework that explicitly integrates diversity, faithfulness, and label clarity into the augmentation process.<n>Our approach employs saliency-guided mixing and a fine-tuned diffusion model to preserve foreground semantics, enrich background diversity, and ensure label consistency.
- Score: 0.6554326244334868
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data augmentation for domain-specific image classification tasks often struggles to simultaneously address diversity, faithfulness, and label clarity of generated data, leading to suboptimal performance in downstream tasks. While existing generative diffusion model-based methods aim to enhance augmentation, they fail to cohesively tackle these three critical aspects and often overlook intrinsic challenges of diffusion models, such as sensitivity to model characteristics and stochasticity under strong transformations. In this paper, we propose a novel framework that explicitly integrates diversity, faithfulness, and label clarity into the augmentation process. Our approach employs saliency-guided mixing and a fine-tuned diffusion model to preserve foreground semantics, enrich background diversity, and ensure label consistency, while mitigating diffusion model limitations. Extensive experiments across fine-grained, long-tail, few-shot, and background robustness tasks demonstrate our method's superior performance over state-of-the-art approaches.
Related papers
- G4Seg: Generation for Inexact Segmentation Refinement with Diffusion Models [38.44872934965588]
This paper considers the problem of utilizing a large-scale text-to-image model to tackle the Inexact diffusion (IS) task.<n>We exploit the pattern discrepancies between original images and mask-conditional generated images to facilitate a coarse-to-fine segmentation refinement.
arXiv Detail & Related papers (2025-06-02T11:05:28Z) - DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers [86.5541501589166]
DiffMoE is a batch-level global token pool that enables experts to access global token distributions during training.<n>It achieves state-of-the-art performance among diffusion models on ImageNet benchmark.<n>The effectiveness of our approach extends beyond class-conditional generation to more challenging tasks such as text-to-image generation.
arXiv Detail & Related papers (2025-03-18T17:57:07Z) - Understanding the Quality-Diversity Trade-off in Diffusion Language Models [0.0]
Diffusion models can be used to model continuous data across a range of domains such as vision and audio.<n>Recent work explores their application to text generation by working in the continuous embedding space.<n>Models lack a natural means to control the inherent trade-off between quality and diversity.
arXiv Detail & Related papers (2025-03-11T17:18:01Z) - InpDiffusion: Image Inpainting Localization via Conditional Diffusion Models [10.213390634031049]
Current IIL methods face two main challenges: a tendency towards overconfidence and difficulty in detecting subtle tampering boundaries.<n>We propose a new paradigm that treats IIL as a conditional mask generation task utilizing diffusion models.<n>Our method, InpDiffusion, utilizes the denoising process enhanced by the integration of image semantic conditions to progressively refine predictions.
arXiv Detail & Related papers (2025-01-06T07:32:12Z) - Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets [65.42834731617226]
We propose a reinforcement learning method for diffusion model finetuning, dubbed Nabla-GFlowNet.<n>We show that our proposed method achieves fast yet diversity- and prior-preserving finetuning of Stable Diffusion, a large-scale text-conditioned image diffusion model.
arXiv Detail & Related papers (2024-12-10T18:59:58Z) - Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas [33.334956022229846]
We propose the Merge-Attend-Diffuse operator, which can be plugged into different types of pretrained diffusion models used in a joint diffusion setting.
Specifically, we merge the diffusion paths, reprogramming self- and cross-attention to operate on the aggregated latent space.
Our method maintains compatibility with the input prompt and visual quality of the generated images while increasing their semantic coherence.
arXiv Detail & Related papers (2024-08-28T09:22:32Z) - A Simple Background Augmentation Method for Object Detection with Diffusion Model [53.32935683257045]
In computer vision, it is well-known that a lack of data diversity will impair model performance.
We propose a simple yet effective data augmentation approach by leveraging advancements in generative models.
Background augmentation, in particular, significantly improves the models' robustness and generalization capabilities.
arXiv Detail & Related papers (2024-08-01T07:40:00Z) - Few-Shot Image Generation by Conditional Relaxing Diffusion Inversion [37.18537753482751]
Conditional Diffusion Relaxing Inversion (CRDI) is designed to enhance distribution diversity in synthetic image generation.
CRDI does not rely on fine-tuning based on only a few samples.
It focuses on reconstructing each target image instance and expanding diversity through few-shot learning.
arXiv Detail & Related papers (2024-07-09T21:58:26Z) - Diffusion Features to Bridge Domain Gap for Semantic Segmentation [2.8616666231199424]
This paper investigates the approach that leverages the sampling and fusion techniques to harness the features of diffusion models efficiently.
By leveraging the strength of text-to-image generation capability, we introduce a new training framework designed to implicitly learn posterior knowledge from it.
arXiv Detail & Related papers (2024-06-02T15:33:46Z) - Bridging Generative and Discriminative Models for Unified Visual
Perception with Diffusion Priors [56.82596340418697]
We propose a simple yet effective framework comprising a pre-trained Stable Diffusion (SD) model containing rich generative priors, a unified head (U-head) capable of integrating hierarchical representations, and an adapted expert providing discriminative priors.
Comprehensive investigations unveil potential characteristics of Vermouth, such as varying granularity of perception concealed in latent variables at distinct time steps and various U-net stages.
The promising results demonstrate the potential of diffusion models as formidable learners, establishing their significance in furnishing informative and robust visual representations.
arXiv Detail & Related papers (2024-01-29T10:36:57Z) - Real-World Image Variation by Aligning Diffusion Inversion Chain [53.772004619296794]
A domain gap exists between generated images and real-world images, which poses a challenge in generating high-quality variations of real-world images.
We propose a novel inference pipeline called Real-world Image Variation by ALignment (RIVAL)
Our pipeline enhances the generation quality of image variations by aligning the image generation process to the source image's inversion chain.
arXiv Detail & Related papers (2023-05-30T04:09:47Z) - Auto-regressive Image Synthesis with Integrated Quantization [55.51231796778219]
This paper presents a versatile framework for conditional image generation.
It incorporates the inductive bias of CNNs and powerful sequence modeling of auto-regression.
Our method achieves superior diverse image generation performance as compared with the state-of-the-art.
arXiv Detail & Related papers (2022-07-21T22:19:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.