Noise Consistency Regularization for Improved Subject-Driven Image Synthesis
- URL: http://arxiv.org/abs/2506.06483v1
- Date: Fri, 06 Jun 2025 19:17:37 GMT
- Title: Noise Consistency Regularization for Improved Subject-Driven Image Synthesis
- Authors: Yao Ni, Song Wen, Piotr Koniusz, Anoop Cherian,
- Abstract summary: Fine-tuning Stable Diffusion enables subject-driven image synthesis by adapting the model to generate images containing specific subjects.<n>Existing fine-tuning methods suffer from two key issues: underfitting, where the model fails to reliably capture subject identity, and overfitting, where it memorizes the subject image and reduces background diversity.<n>We propose two auxiliary consistency losses for diffusion fine-tuning. First, a prior consistency regularization loss ensures that the predicted diffusion noise for prior (non-subject) images remains consistent with that of the pretrained model, improving fidelity.
- Score: 55.75426086791612
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-tuning Stable Diffusion enables subject-driven image synthesis by adapting the model to generate images containing specific subjects. However, existing fine-tuning methods suffer from two key issues: underfitting, where the model fails to reliably capture subject identity, and overfitting, where it memorizes the subject image and reduces background diversity. To address these challenges, we propose two auxiliary consistency losses for diffusion fine-tuning. First, a prior consistency regularization loss ensures that the predicted diffusion noise for prior (non-subject) images remains consistent with that of the pretrained model, improving fidelity. Second, a subject consistency regularization loss enhances the fine-tuned model's robustness to multiplicative noise modulated latent code, helping to preserve subject identity while improving diversity. Our experimental results demonstrate that incorporating these losses into fine-tuning not only preserves subject identity but also enhances image diversity, outperforming DreamBooth in terms of CLIP scores, background variation, and overall visual quality.
Related papers
- A Simple Combination of Diffusion Models for Better Quality Trade-Offs in Image Denoising [43.44633086975204]
We propose an intuitive method for leveraging pretrained diffusion models.<n>We then introduce our proposed Linear Combination Diffusion Denoiser.<n> LCDD achieves state-of-the-art performance and offers controlled, well-behaved trade-offs.
arXiv Detail & Related papers (2025-03-18T19:02:19Z) - Frequency-Aware Guidance for Blind Image Restoration via Diffusion Models [20.898262207229873]
Blind image restoration remains a significant challenge in low-level vision tasks.
Guided diffusion models have achieved promising results in blind image restoration.
We propose a novel frequency-aware guidance loss that can be integrated into various diffusion models in a plug-and-play manner.
arXiv Detail & Related papers (2024-11-19T12:18:16Z) - Confidence-aware Denoised Fine-tuning of Off-the-shelf Models for Certified Robustness [56.2479170374811]
We introduce Fine-Tuning with Confidence-Aware Denoised Image Selection (FT-CADIS)
FT-CADIS is inspired by the observation that the confidence of off-the-shelf classifiers can effectively identify hallucinated images during denoised smoothing.
It has established the state-of-the-art certified robustness among denoised smoothing methods across all $ell$-adversary radius in various benchmarks.
arXiv Detail & Related papers (2024-11-13T09:13:20Z) - FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process [120.91393949012014]
FreeEnhance is a framework for content-consistent image enhancement using off-the-shelf image diffusion models.
In the noising stage, FreeEnhance is devised to add lighter noise to the region with higher frequency to preserve the high-frequent patterns in the original image.
In the denoising stage, we present three target properties as constraints to regularize the predicted noise, enhancing images with high acutance and high visual quality.
arXiv Detail & Related papers (2024-09-11T17:58:50Z) - Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion
Models [58.46926334842161]
This work illuminates the fundamental reasons for such misalignment, pinpointing issues related to low attention activation scores and mask overlaps.
We propose two novel objectives, the Separate loss and the Enhance loss, that reduce object mask overlaps and maximize attention scores.
Our method diverges from conventional test-time-adaptation techniques, focusing on finetuning critical parameters, which enhances scalability and generalizability.
arXiv Detail & Related papers (2023-12-10T22:07:42Z) - Robustness-Guided Image Synthesis for Data-Free Quantization [15.91924736452861]
We propose Robustness-Guided Image Synthesis (RIS) to enrich the semantics of synthetic images and improve image diversity.
RIS is a simple but effective method to enrich the semantics of synthetic images and improve image diversity.
We achieve state-of-the-art performance for various settings on data-free quantization and can be extended to other data-free compression tasks.
arXiv Detail & Related papers (2023-10-05T16:39:14Z) - Steerable Conditional Diffusion for Out-of-Distribution Adaptation in Medical Image Reconstruction [75.91471250967703]
We introduce a novel sampling framework called Steerable Conditional Diffusion.<n>This framework adapts the diffusion model, concurrently with image reconstruction, based solely on the information provided by the available measurement.<n>We achieve substantial enhancements in out-of-distribution performance across diverse imaging modalities.
arXiv Detail & Related papers (2023-08-28T08:47:06Z) - Auto-regressive Image Synthesis with Integrated Quantization [55.51231796778219]
This paper presents a versatile framework for conditional image generation.
It incorporates the inductive bias of CNNs and powerful sequence modeling of auto-regression.
Our method achieves superior diverse image generation performance as compared with the state-of-the-art.
arXiv Detail & Related papers (2022-07-21T22:19:17Z) - Robust Single Image Dehazing Based on Consistent and Contrast-Assisted
Reconstruction [95.5735805072852]
We propose a novel density-variational learning framework to improve the robustness of the image dehzing model.
Specifically, the dehazing network is optimized under the consistency-regularized framework.
Our method significantly surpasses the state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-29T08:11:04Z) - Human Pose Transfer with Augmented Disentangled Feature Consistency [28.744108771350078]
We propose a pose transfer network with augmented Disentangled Feature Consistency (DFC-Net) to facilitate human pose transfer.
Given a pair of images containing the source and target person, DFC-Net extracts pose and static information from the source and target respectively, then synthesizes an image of the target person with the desired pose from the source.
arXiv Detail & Related papers (2021-07-23T01:25:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.