Penalizing Boundary Activation for Object Completeness in Diffusion Models
- URL: http://arxiv.org/abs/2509.16968v2
- Date: Tue, 23 Sep 2025 16:17:58 GMT
- Title: Penalizing Boundary Activation for Object Completeness in Diffusion Models
- Authors: Haoyang Xu, Tianhao Zhao, Sibei Yang, Yutian Lin,
- Abstract summary: Diffusion models have emerged as a powerful technique for text-to-image (T2I) generation.<n>In this study, we conduct an in-depth analysis of the incompleteness issue and reveal that the primary factor behind incomplete object generation is the usage of RandomCrop during model training.<n>We propose a training-free solution that penalizes activation values at image boundaries during the early denoising steps.
- Score: 35.58050562158284
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have emerged as a powerful technique for text-to-image (T2I) generation, creating high-quality, diverse images across various domains. However, a common limitation in these models is the incomplete display of objects, where fragments or missing parts undermine the model's performance in downstream applications. In this study, we conduct an in-depth analysis of the incompleteness issue and reveal that the primary factor behind incomplete object generation is the usage of RandomCrop during model training. This widely used data augmentation method, though enhances model generalization ability, disrupts object continuity during training. To address this, we propose a training-free solution that penalizes activation values at image boundaries during the early denoising steps. Our method is easily applicable to pre-trained Stable Diffusion models with minimal modifications and negligible computational overhead. Extensive experiments demonstrate the effectiveness of our method, showing substantial improvements in object integrity and image quality.
Related papers
- EFDiT: Efficient Fine-grained Image Generation Using Diffusion Transformer Models [9.95860304505597]
In large-scale fine-grained image generation, issues of semantic information entanglement and insufficient detail persist.<n>We introduce a concept of a tiered embedder in fine-grained image generation, which integrates semantic information from both super and child classes.<n>We propose an efficient ProAttention mechanism that can be effectively implemented in the diffusion model.
arXiv Detail & Related papers (2025-12-03T14:10:06Z) - CountDiffusion: Text-to-Image Synthesis with Training-Free Counting-Guidance Diffusion [82.82885671486795]
We propose CountDiffusion, a training-free framework aiming at generating images with correct object quantity from textual descriptions.<n>The proposed CountDiffusion can be plugged into any diffusion-based text-to-image (T2I) generation models without further training.
arXiv Detail & Related papers (2025-05-07T11:47:35Z) - D2C: Unlocking the Potential of Continuous Autoregressive Image Generation with Discrete Tokens [80.75893450536577]
We propose D2C, a novel two-stage method to enhance model generation capacity.<n>In the first stage, the discrete-valued tokens representing coarse-grained image features are sampled by employing a small discrete-valued generator.<n>In the second stage, the continuous-valued tokens representing fine-grained image features are learned conditioned on the discrete token sequence.
arXiv Detail & Related papers (2025-03-21T13:58:49Z) - DiffDoctor: Diagnosing Image Diffusion Models Before Treating [57.82359018425674]
We propose DiffDoctor, a two-stage pipeline to assist image diffusion models in generating fewer artifacts.<n>We collect a dataset of over 1M flawed synthesized images and set up an efficient human-in-the-loop annotation process.<n>The learned artifact detector is then involved in the second stage to optimize the diffusion model by providing pixel-level feedback.
arXiv Detail & Related papers (2025-01-21T18:56:41Z) - Boosting Alignment for Post-Unlearning Text-to-Image Generative Models [55.82190434534429]
Large-scale generative models have shown impressive image-generation capabilities, propelled by massive data.<n>This often inadvertently leads to the generation of harmful or inappropriate content and raises copyright concerns.<n>We propose a framework that seeks an optimal model update at each unlearning iteration, ensuring monotonic improvement on both objectives.
arXiv Detail & Related papers (2024-12-09T21:36:10Z) - Fast constrained sampling in pre-trained diffusion models [77.21486516041391]
We propose an algorithm that enables fast and high-quality generation under arbitrary constraints.<n>During inference, we can interchange between gradient updates computed on the noisy image and updates computed on the final, clean image.<n>Our approach produces results that rival or surpass the state-of-the-art training-free inference approaches.
arXiv Detail & Related papers (2024-10-24T14:52:38Z) - Learning Diffusion Model from Noisy Measurement using Principled Expectation-Maximization Method [9.173055778539641]
We propose a principled expectation-maximization (EM) framework that iteratively learns diffusion models from noisy data with arbitrary corruption types.
Our framework employs a plug-and-play Monte Carlo method to accurately estimate clean images from noisy measurements, followed by training the diffusion model using the reconstructed images.
arXiv Detail & Related papers (2024-10-15T03:54:59Z) - Mask-guided cross-image attention for zero-shot in-silico histopathologic image generation with a diffusion model [0.10910416614141322]
Diffusion models are the state-of-the-art solution for generating in-silico images.<n>Appearance transfer diffusion models are designed for natural images.<n>In computational pathology, specifically in oncology, it is not straightforward to define which objects in an image should be classified as foreground and background.<n>We contribute to the applicability of appearance transfer models to diffusion-stained images by modifying the appearance transfer guidance to alternate between class-specific AdaIN feature statistics matchings.
arXiv Detail & Related papers (2024-07-16T12:36:26Z) - Active Generation for Image Classification [45.93535669217115]
We propose to address the efficiency of image generation by focusing on the specific needs and characteristics of the model.
With a central tenet of active learning, our method, named ActGen, takes a training-aware approach to image generation.
arXiv Detail & Related papers (2024-03-11T08:45:31Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - Cross-domain Compositing with Pretrained Diffusion Models [34.98199766006208]
We employ a localized, iterative refinement scheme which infuses the injected objects with contextual information derived from the background scene.
Our method produces higher quality and realistic results without requiring any annotations or training.
arXiv Detail & Related papers (2023-02-20T18:54:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.