Related papers: FreSca: Unveiling the Scaling Space in Diffusion Models

FreSca: Unveiling the Scaling Space in Diffusion Models

URL: http://arxiv.org/abs/2504.02154v1
Date: Wed, 02 Apr 2025 22:03:11 GMT
Title: FreSca: Unveiling the Scaling Space in Diffusion Models
Authors: Chao Huang, Susan Liang, Yunlong Tang, Li Ma, Yapeng Tian, Chenliang Xu,
Abstract summary: Diffusion models offer impressive controllability for image tasks, primarily through noise predictions that encode task-specific information and guidance enabling adjustable scaling.<n>We investigate this space, starting with inversion-based editing where the difference between conditional/unconditional noise predictions carries key semantic information.<n>Our core contribution stems from a Fourier analysis of noise predictions, revealing that its low- and high-frequency components evolve differently throughout diffusion.<n>Based on this insight, we introduce FreSca, a straightforward method that applies guidance scaling independently to different frequency bands in the Fourier domain.
Score: 52.20473039489599
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion models offer impressive controllability for image tasks, primarily through noise predictions that encode task-specific information and classifier-free guidance enabling adjustable scaling. This scaling mechanism implicitly defines a ``scaling space'' whose potential for fine-grained semantic manipulation remains underexplored. We investigate this space, starting with inversion-based editing where the difference between conditional/unconditional noise predictions carries key semantic information. Our core contribution stems from a Fourier analysis of noise predictions, revealing that its low- and high-frequency components evolve differently throughout diffusion. Based on this insight, we introduce FreSca, a straightforward method that applies guidance scaling independently to different frequency bands in the Fourier domain. FreSca demonstrably enhances existing image editing methods without retraining. Excitingly, its effectiveness extends to image understanding tasks such as depth estimation, yielding quantitative gains across multiple datasets.

Related papers

CAM-Seg: A Continuous-valued Embedding Approach for Semantic Image Generation [11.170848285659572]
Autoencoder accuracy on segmentation mask using quantized embeddings is 8% lower than continuous-valued embeddings. We propose a continuous-valued embedding framework for semantic segmentation. Our approach eliminates the need for discrete latent representations while preserving fine-grained semantic details.
arXiv Detail & Related papers (2025-03-19T18:06:54Z)
Robust Representation Consistency Model via Contrastive Denoising [83.47584074390842]
randomized smoothing provides theoretical guarantees for certifying robustness against adversarial perturbations. diffusion models have been successfully employed for randomized smoothing to purify noise-perturbed samples. We reformulate the generative modeling task along the diffusion trajectories in pixel space as a discriminative task in the latent space.
arXiv Detail & Related papers (2025-01-22T18:52:06Z)
Diffusion Priors for Variational Likelihood Estimation and Image Denoising [10.548018200066858]
We propose adaptive likelihood estimation and MAP inference during the reverse diffusion process to tackle real-world noise. Experiments and analyses on diverse real-world datasets demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-10-23T02:52:53Z)
Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration [64.84134880709625]
We show that it is possible to perform domain adaptation via the noise space using diffusion models.<n>In particular, by leveraging the unique property of how auxiliary conditional inputs influence the multi-step denoising process, we derive a meaningful diffusion loss.<n>We present crucial strategies such as channel-shuffling layer and residual-swapping contrastive learning in the diffusion model.
arXiv Detail & Related papers (2024-06-26T17:40:30Z)
Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment [56.609042046176555]
suboptimal noise-data mapping leads to slow training of diffusion models. Drawing inspiration from the immiscibility phenomenon in physics, we propose Immiscible Diffusion. Our approach is remarkably simple, requiring only one line of code to restrict the diffuse-able area for each image.
arXiv Detail & Related papers (2024-06-18T06:20:42Z)
Diffusion-RSCC: Diffusion Probabilistic Model for Change Captioning in Remote Sensing Images [14.236580915897585]
RSICC aims at generating human-like language to describe semantic changes between bi-temporal remote sensing image pairs. Inspired by the remarkable generative power of diffusion model, we propose a probabilistic diffusion model for RSICC. In training process, we construct a noise predictor conditioned on cross modal features to learn the distribution from the real caption distribution to the standard Gaussian distribution under the Markov chain. In testing phase, the well-trained noise predictor helps to estimate the mean value of the distribution and generate change captions step by step.
arXiv Detail & Related papers (2024-05-21T15:44:31Z)
Diffusion Models With Learned Adaptive Noise [12.530583016267768]
In this paper, we explore whether the diffusion process can be learned from data. A widely held assumption is that the ELBO is invariant to the noise process. We propose MULAN, a learned diffusion process that applies noise at different rates across an image.
arXiv Detail & Related papers (2023-12-20T18:00:16Z)
Diffusion in Diffusion: Cyclic One-Way Diffusion for Text-Vision-Conditioned Generation [11.80682025950519]
In this work, we investigate the diffusion (physics) in diffusion (machine learning) properties. We propose our Cyclic One-Way Diffusion (COW) method to control the direction of diffusion phenomenon. Our method provides a novel perspective to understand the task needs and is applicable to a wider range of customization scenarios.
arXiv Detail & Related papers (2023-06-14T05:25:06Z)
A Variational Perspective on Solving Inverse Problems with Diffusion Models [101.831766524264]
Inverse tasks can be formulated as inferring a posterior distribution over data. This is however challenging in diffusion models since the nonlinear and iterative nature of the diffusion process renders the posterior intractable. We propose a variational approach that by design seeks to approximate the true posterior distribution.
arXiv Detail & Related papers (2023-05-07T23:00:47Z)
Representing Noisy Image Without Denoising [91.73819173191076]
Fractional-order Moments in Radon space (FMR) is designed to derive robust representation directly from noisy images. Unlike earlier integer-order methods, our work is a more generic design taking such classical methods as special cases.
arXiv Detail & Related papers (2023-01-18T10:13:29Z)
Embedding Propagation: Smoother Manifold for Few-Shot Classification [131.81692677836202]
We propose to use embedding propagation as an unsupervised non-parametric regularizer for manifold smoothing in few-shot classification. We empirically show that embedding propagation yields a smoother embedding manifold. We show that embedding propagation consistently improves the accuracy of the models in multiple semi-supervised learning scenarios by up to 16% points.
arXiv Detail & Related papers (2020-03-09T13:51:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.