EFDiT: Efficient Fine-grained Image Generation Using Diffusion Transformer Models
- URL: http://arxiv.org/abs/2512.05152v1
- Date: Wed, 03 Dec 2025 14:10:06 GMT
- Title: EFDiT: Efficient Fine-grained Image Generation Using Diffusion Transformer Models
- Authors: Kun Wang, Donglin Di, Tonghua Su, Lei Fan,
- Abstract summary: In large-scale fine-grained image generation, issues of semantic information entanglement and insufficient detail persist.<n>We introduce a concept of a tiered embedder in fine-grained image generation, which integrates semantic information from both super and child classes.<n>We propose an efficient ProAttention mechanism that can be effectively implemented in the diffusion model.
- Score: 9.95860304505597
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models are highly regarded for their controllability and the diversity of images they generate. However, class-conditional generation methods based on diffusion models often focus on more common categories. In large-scale fine-grained image generation, issues of semantic information entanglement and insufficient detail in the generated images still persist. This paper attempts to introduce a concept of a tiered embedder in fine-grained image generation, which integrates semantic information from both super and child classes, allowing the diffusion model to better incorporate semantic information and address the issue of semantic entanglement. To address the issue of insufficient detail in fine-grained images, we introduce the concept of super-resolution during the perceptual information generation stage, enhancing the detailed features of fine-grained images through enhancement and degradation models. Furthermore, we propose an efficient ProAttention mechanism that can be effectively implemented in the diffusion model. We evaluate our method through extensive experiments on public benchmarks, demonstrating that our approach outperforms other state-of-the-art fine-tuning methods in terms of performance.
Related papers
- Reversible Efficient Diffusion for Image Fusion [66.35113261837469]
Multi-modal image fusion aims to consolidate complementary information from diverse source images into a unified representation.<n>While diffusion models have demonstrated impressive generative capabilities in image generation, they often suffer from detail loss when applied to image fusion tasks.<n>This issue arises from the accumulation of noise errors inherent in the Markov process, leading to inconsistency and degradation in the fused results.<n>We propose the Reversible Efficient Diffusion (RED) model - an explicitly supervised training framework that inherits the powerful generative capability of diffusion models while avoiding the distribution estimation.
arXiv Detail & Related papers (2026-01-28T05:14:55Z) - G4Seg: Generation for Inexact Segmentation Refinement with Diffusion Models [38.44872934965588]
This paper considers the problem of utilizing a large-scale text-to-image model to tackle the Inexact diffusion (IS) task.<n>We exploit the pattern discrepancies between original images and mask-conditional generated images to facilitate a coarse-to-fine segmentation refinement.
arXiv Detail & Related papers (2025-06-02T11:05:28Z) - Dataset Augmentation by Mixing Visual Concepts [3.5420134832331334]
This paper proposes a dataset augmentation method by fine-tuning pre-trained diffusion models.<n>We adapt the diffusion model by conditioning it with real images and novel text embeddings.<n>Our approach outperforms state-of-the-art augmentation techniques on benchmark classification tasks.
arXiv Detail & Related papers (2024-12-19T19:42:22Z) - GenMix: Effective Data Augmentation with Generative Diffusion Model Image Editing [60.101097709212716]
This paper introduces GenMix, a generalizable prompt-guided generative data augmentation approach.<n>Our technique leverages image editing to generate augmented images based on custom conditional prompts.<n>Our approach mitigates unrealistic images and label ambiguity, improving the performance and adversarial robustness of the resulting models.
arXiv Detail & Related papers (2024-12-03T10:45:34Z) - Active Generation for Image Classification [45.93535669217115]
We propose to address the efficiency of image generation by focusing on the specific needs and characteristics of the model.
With a central tenet of active learning, our method, named ActGen, takes a training-aware approach to image generation.
arXiv Detail & Related papers (2024-03-11T08:45:31Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion
Models [63.20512617502273]
We propose a method called SDD to prevent problematic content generation in text-to-image diffusion models.
Our method eliminates a much greater proportion of harmful content from the generated images without degrading the overall image quality.
arXiv Detail & Related papers (2023-07-12T07:48:29Z) - Conditional Generation from Unconditional Diffusion Models using
Denoiser Representations [94.04631421741986]
We propose adapting pre-trained unconditional diffusion models to new conditions using the learned internal representations of the denoiser network.
We show that augmenting the Tiny ImageNet training set with synthetic images generated by our approach improves the classification accuracy of ResNet baselines by up to 8%.
arXiv Detail & Related papers (2023-06-02T20:09:57Z) - DDRF: Denoising Diffusion Model for Remote Sensing Image Fusion [7.06521373423708]
Denosing diffusion model, as a generative model, has received a lot of attention in the field of image generation.
We introduce diffusion model to the image fusion field, treating the image fusion task as image-to-image translation.
Our method can inspire other works and gain insight into this field to better apply the diffusion model to image fusion tasks.
arXiv Detail & Related papers (2023-04-10T12:28:27Z) - Your Diffusion Model is Secretly a Zero-Shot Classifier [90.40799216880342]
We show that density estimates from large-scale text-to-image diffusion models can be leveraged to perform zero-shot classification.
Our generative approach to classification attains strong results on a variety of benchmarks.
Our results are a step toward using generative over discriminative models for downstream tasks.
arXiv Detail & Related papers (2023-03-28T17:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.