HiGFA: Hierarchical Guidance for Fine-grained Data Augmentation with Diffusion Models
- URL: http://arxiv.org/abs/2511.12547v2
- Date: Mon, 24 Nov 2025 13:31:40 GMT
- Title: HiGFA: Hierarchical Guidance for Fine-grained Data Augmentation with Diffusion Models
- Authors: Zhiguang Lu, Qianqian Xu, Peisong Wen, Siran Dai, Qingming Huang,
- Abstract summary: Generative diffusion models show promise for data augmentation.<n>Applying them to fine-grained tasks presents a significant challenge.<n>HiGFA is a hierarchical, confidence-driven orchestration that generates diverse yet faithful synthetic images.
- Score: 82.10385962490051
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative diffusion models show promise for data augmentation. However, applying them to fine-grained tasks presents a significant challenge: ensuring synthetic images accurately capture the subtle, category-defining features critical for high fidelity. Standard approaches, such as text-based Classifier-Free Guidance (CFG), often lack the required specificity, potentially generating misleading examples that degrade fine-grained classifier performance. To address this, we propose Hierarchically Guided Fine-grained Augmentation (HiGFA). HiGFA leverages the temporal dynamics of the diffusion sampling process. It employs strong text and transformed contour guidance with fixed strengths in the early-to-mid sampling stages to establish overall scene, style, and structure. In the final sampling stages, HiGFA activates a specialized fine-grained classifier guidance and dynamically modulates the strength of all guidance signals based on prediction confidence. This hierarchical, confidence-driven orchestration enables HiGFA to generate diverse yet faithful synthetic images by intelligently balancing global structure formation with precise detail refinement. Experiments on several FGVC datasets demonstrate the effectiveness of HiGFA.
Related papers
- StepVAR: Structure-Texture Guided Pruning for Visual Autoregressive Models [98.72926158261937]
We propose a training-free token pruning framework for Visual AutoRegressive models.<n>We employ a lightweight high-pass filter to capture local texture details, while leveraging Principal Component Analysis (PCA) to preserve global structural information.<n>To maintain valid next-scale prediction under sparse tokens, we introduce a nearest neighbor feature propagation strategy.
arXiv Detail & Related papers (2026-03-02T11:35:05Z) - SSG: Scaled Spatial Guidance for Multi-Scale Visual Autoregressive Generation [10.295970926059812]
Visual autoregressive ( VAR) models generate images through next-scale prediction, naturally achieving coarse-to-fine, fast, high-fidelity mirroring human perception.<n>In practice, this hierarchy can drift at inference time, as limited capacity and accumulated error cause the model to deviate from its coarse-to-fine nature.<n>We propose Scaled Spatial Guidance (SSG), training-free, inference-time guidance that steers generation toward the intended hierarchy while maintaining global coherence.
arXiv Detail & Related papers (2026-02-05T10:48:58Z) - Granular-ball Guided Masking: Structure-aware Data Augmentation [97.18560547134587]
Granular-ball Guided Masking (GBGM) is a structure-aware augmentation strategy guided by Granular-ball Computing (GBC)<n>GBGM adaptively preserves semantically rich, structurally important regions while suppressing redundant areas through a coarse-to-fine hierarchical masking process.<n>Experiments on multiple benchmarks demonstrate consistent improvements in classification accuracy and masked image reconstruction.
arXiv Detail & Related papers (2025-12-24T07:15:33Z) - GRAVER: Generative Graph Vocabularies for Robust Graph Foundation Models Fine-tuning [92.19531718298744]
Graph Foundation Models (GFMs) hold promise for broad applicability across diverse graph tasks and domains.<n>Existing GFMs struggle with unstable few-shot fine-tuning.<n>We propose GRAVER, a novel Generative gRAph VocabulariEs for Robust GFM fine-tuning framework.
arXiv Detail & Related papers (2025-11-05T13:07:26Z) - IAR2: Improving Autoregressive Visual Generation with Semantic-Detail Associated Token Prediction [77.06211178777939]
IAR2 is an advanced autoregressive framework that enables a hierarchical semantic-detail synthesis process.<n>We show that IAR2 sets a new state-of-the-art for autoregressive image generation, achieving a FID of 1.50 on ImageNet.
arXiv Detail & Related papers (2025-10-08T12:08:21Z) - Prompt-aware classifier free guidance for diffusion models [3.3115063666033167]
We introduce a prompt-aware framework that predicts scale-dependent quality and selects the optimal guidance at inference.<n>A lightweight predictor, conditioned on semantic embeddings and linguistic complexity, estimates multi-metric quality curves.<n>Experiments on MSCOCO2014 and AudioCaps show consistent improvements over vanilla CFG.
arXiv Detail & Related papers (2025-09-25T09:16:25Z) - FedAPT: Federated Adversarial Prompt Tuning for Vision-Language Models [97.35577473867296]
Federated Adversarial Prompt Tuning (textbfFedAPT) is a novel method designed to enhance the adversarial robustness of FPT.<n>To address this issue, we propose a textbfclass-aware prompt generator that generates visual prompts from text prompts.<n>Experiments on multiple image classification datasets demonstrate the superiority of FedAPT in improving adversarial robustness.
arXiv Detail & Related papers (2025-09-03T03:46:35Z) - Classifier-Free Guidance: From High-Dimensional Analysis to Generalized Guidance Forms [22.44946627454133]
We show that CFG accurately reproduces the target distribution in sufficiently high and infinite dimensions.<n>We show that there is a large family of guidances enjoying this property, in particular nonlinear CFG generalizations.<n>Our findings are validated with experiments on class-conditional and text-to-image generation using state-of-the-art diffusion and flow-matching models.
arXiv Detail & Related papers (2025-02-11T10:29:29Z) - FedEGG: Federated Learning with Explicit Global Guidance [90.04705121816185]
Federated Learning (FL) holds great potential for diverse applications owing to its privacy-preserving nature.<n>Existing methods help address these challenges via optimization-based client constraints, adaptive client selection, or the use of pre-trained models or synthetic data.<n>We present bftextFedEGG, a new FL algorithm that constructs a global guiding task using a well-defined, easy-to-converge learning task.
arXiv Detail & Related papers (2024-04-18T04:25:21Z) - Consistency Regularization for Generalizable Source-free Domain
Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset.
Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets.
We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z) - Towards Robust Unsupervised Disentanglement of Sequential Data -- A Case
Study Using Music Audio [17.214062755082065]
Disentangled sequential autoencoders (DSAEs) represent a class of probabilistic graphical models.
We show that the vanilla DSAE suffers from being sensitive to the choice of model architecture and capacity of the dynamic latent variables.
We propose TS-DSAE, a two-stage training framework that first learns sequence-level prior distributions.
arXiv Detail & Related papers (2022-05-12T04:11:25Z) - Deep Autoencoding Topic Model with Scalable Hybrid Bayesian Inference [55.35176938713946]
We develop deep autoencoding topic model (DATM) that uses a hierarchy of gamma distributions to construct its multi-stochastic-layer generative network.
We propose a Weibull upward-downward variational encoder that deterministically propagates information upward via a deep neural network, followed by a downward generative model.
The efficacy and scalability of our models are demonstrated on both unsupervised and supervised learning tasks on big corpora.
arXiv Detail & Related papers (2020-06-15T22:22:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.