FreSca: Unveiling the Scaling Space in Diffusion Models
- URL: http://arxiv.org/abs/2504.02154v1
- Date: Wed, 02 Apr 2025 22:03:11 GMT
- Title: FreSca: Unveiling the Scaling Space in Diffusion Models
- Authors: Chao Huang, Susan Liang, Yunlong Tang, Li Ma, Yapeng Tian, Chenliang Xu,
- Abstract summary: Diffusion models offer impressive controllability for image tasks, primarily through noise predictions that encode task-specific information and guidance enabling adjustable scaling.<n>We investigate this space, starting with inversion-based editing where the difference between conditional/unconditional noise predictions carries key semantic information.<n>Our core contribution stems from a Fourier analysis of noise predictions, revealing that its low- and high-frequency components evolve differently throughout diffusion.<n>Based on this insight, we introduce FreSca, a straightforward method that applies guidance scaling independently to different frequency bands in the Fourier domain.
- Score: 52.20473039489599
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models offer impressive controllability for image tasks, primarily through noise predictions that encode task-specific information and classifier-free guidance enabling adjustable scaling. This scaling mechanism implicitly defines a ``scaling space'' whose potential for fine-grained semantic manipulation remains underexplored. We investigate this space, starting with inversion-based editing where the difference between conditional/unconditional noise predictions carries key semantic information. Our core contribution stems from a Fourier analysis of noise predictions, revealing that its low- and high-frequency components evolve differently throughout diffusion. Based on this insight, we introduce FreSca, a straightforward method that applies guidance scaling independently to different frequency bands in the Fourier domain. FreSca demonstrably enhances existing image editing methods without retraining. Excitingly, its effectiveness extends to image understanding tasks such as depth estimation, yielding quantitative gains across multiple datasets.
Related papers
- Wavelet-Guided Dual-Frequency Encoding for Remote Sensing Change Detection [67.84730634802204]
Change detection in remote sensing imagery plays a vital role in various engineering applications, such as natural disaster monitoring, urban expansion tracking, and infrastructure management.<n>Most existing methods still rely on spatial-domain modeling, where the limited diversity of feature representations hinders the detection of subtle change regions.<n>We observe that frequency-domain feature modeling particularly in the wavelet domain amplify fine-grained differences in frequency components, enhancing the perception of edge changes that are challenging to capture in the spatial domain.
arXiv Detail & Related papers (2025-08-07T11:14:16Z) - Frequency Domain-Based Diffusion Model for Unpaired Image Dehazing [92.61216319417208]
We propose a novel frequency domain-based diffusion model, named ours, for fully exploiting the beneficial knowledge in unpaired clear data.<n>Inspired by the strong generative ability shown by Diffusion Models (DMs), we tackle the dehazing task from the perspective of frequency domain reconstruction.
arXiv Detail & Related papers (2025-07-02T01:22:46Z) - HalluRNN: Mitigating Hallucinations via Recurrent Cross-Layer Reasoning in Large Vision-Language Models [11.826832299262199]
HalluRNN enhances model stability through recurrent cross-layer reasoning.<n>By fine-tuning only the DG-DPU module, HalluRNN achieves strong and robust performance across multiple benchmarks.
arXiv Detail & Related papers (2025-06-21T04:56:55Z) - Generalizable Multispectral Land Cover Classification via Frequency-Aware Mixture of Low-Rank Token Experts [22.75047167955269]
We introduce Land-MoE, a novel approach for multispectral land cover classification (MLCC)<n>Land-MoE comprises two key modules: the mixture of low-rank token experts (MoLTE) and frequency-aware filters (FAF)
arXiv Detail & Related papers (2025-05-20T08:52:28Z) - CAM-Seg: A Continuous-valued Embedding Approach for Semantic Image Generation [11.170848285659572]
Autoencoder accuracy on segmentation mask using quantized embeddings is 8% lower than continuous-valued embeddings.
We propose a continuous-valued embedding framework for semantic segmentation.
Our approach eliminates the need for discrete latent representations while preserving fine-grained semantic details.
arXiv Detail & Related papers (2025-03-19T18:06:54Z) - DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers [86.5541501589166]
DiffMoE is a batch-level global token pool that enables experts to access global token distributions during training.<n>It achieves state-of-the-art performance among diffusion models on ImageNet benchmark.<n>The effectiveness of our approach extends beyond class-conditional generation to more challenging tasks such as text-to-image generation.
arXiv Detail & Related papers (2025-03-18T17:57:07Z) - Exploring Representation-Aligned Latent Space for Better Generation [86.45670422239317]
We introduce ReaLS, which integrates semantic priors to improve generation performance.<n>We show that fundamental DiT and SiT trained on ReaLS can achieve a 15% improvement in FID metric.<n>The enhanced semantic latent space enables more perceptual downstream tasks, such as segmentation and depth estimation.
arXiv Detail & Related papers (2025-02-01T07:42:12Z) - Robust Representation Consistency Model via Contrastive Denoising [83.47584074390842]
randomized smoothing provides theoretical guarantees for certifying robustness against adversarial perturbations.
diffusion models have been successfully employed for randomized smoothing to purify noise-perturbed samples.
We reformulate the generative modeling task along the diffusion trajectories in pixel space as a discriminative task in the latent space.
arXiv Detail & Related papers (2025-01-22T18:52:06Z) - Diffusion Priors for Variational Likelihood Estimation and Image Denoising [10.548018200066858]
We propose adaptive likelihood estimation and MAP inference during the reverse diffusion process to tackle real-world noise.
Experiments and analyses on diverse real-world datasets demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2024-10-23T02:52:53Z) - Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration [64.84134880709625]
We show that it is possible to perform domain adaptation via the noise space using diffusion models.<n>In particular, by leveraging the unique property of how auxiliary conditional inputs influence the multi-step denoising process, we derive a meaningful diffusion loss.<n>We present crucial strategies such as channel-shuffling layer and residual-swapping contrastive learning in the diffusion model.
arXiv Detail & Related papers (2024-06-26T17:40:30Z) - Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment [56.609042046176555]
suboptimal noise-data mapping leads to slow training of diffusion models.
Drawing inspiration from the immiscibility phenomenon in physics, we propose Immiscible Diffusion.
Our approach is remarkably simple, requiring only one line of code to restrict the diffuse-able area for each image.
arXiv Detail & Related papers (2024-06-18T06:20:42Z) - Ensembling Diffusion Models via Adaptive Feature Aggregation [14.663257610094625]
Leveraging multiple high-quality models to produce stronger generation ability is valuable, but has not been extensively studied.<n>Existing methods primarily adopt parameter merging strategies to produce a new static model.<n>We propose Adaptive Feature Aggregation (AFA), which dynamically adjusts the contributions of multiple models at the feature level according to various states.
arXiv Detail & Related papers (2024-05-27T11:55:35Z) - Image Deraining with Frequency-Enhanced State Space Model [2.9465623430708905]
This study introduces SSM to image deraining with deraining-specific enhancements and proposes a Deraining Frequency-Enhanced State Space Model (DFSSM)<n>We develop a novel mixed-scale gated-convolutional block, which uses convolutions with multiple kernel sizes to capture various scale degradations effectively.<n>Experiments on synthetic and real-world rainy image datasets show that our method surpasses state-of-the-art methods.
arXiv Detail & Related papers (2024-05-26T07:45:12Z) - Diffusion-RSCC: Diffusion Probabilistic Model for Change Captioning in Remote Sensing Images [14.236580915897585]
RSICC aims at generating human-like language to describe semantic changes between bi-temporal remote sensing image pairs.
Inspired by the remarkable generative power of diffusion model, we propose a probabilistic diffusion model for RSICC.
In training process, we construct a noise predictor conditioned on cross modal features to learn the distribution from the real caption distribution to the standard Gaussian distribution under the Markov chain.
In testing phase, the well-trained noise predictor helps to estimate the mean value of the distribution and generate change captions step by step.
arXiv Detail & Related papers (2024-05-21T15:44:31Z) - Diffusion Models With Learned Adaptive Noise [12.530583016267768]
In this paper, we explore whether the diffusion process can be learned from data.
A widely held assumption is that the ELBO is invariant to the noise process.
We propose MULAN, a learned diffusion process that applies noise at different rates across an image.
arXiv Detail & Related papers (2023-12-20T18:00:16Z) - Diffusion Models Without Attention [110.5623058129782]
Diffusion State Space Model (DiffuSSM) is an architecture that supplants attention mechanisms with a more scalable state space model backbone.
Our focus on FLOP-efficient architectures in diffusion training marks a significant step forward.
arXiv Detail & Related papers (2023-11-30T05:15:35Z) - Diffusion in Diffusion: Cyclic One-Way Diffusion for Text-Vision-Conditioned Generation [11.80682025950519]
In this work, we investigate the diffusion (physics) in diffusion (machine learning) properties.
We propose our Cyclic One-Way Diffusion (COW) method to control the direction of diffusion phenomenon.
Our method provides a novel perspective to understand the task needs and is applicable to a wider range of customization scenarios.
arXiv Detail & Related papers (2023-06-14T05:25:06Z) - A Variational Perspective on Solving Inverse Problems with Diffusion
Models [101.831766524264]
Inverse tasks can be formulated as inferring a posterior distribution over data.
This is however challenging in diffusion models since the nonlinear and iterative nature of the diffusion process renders the posterior intractable.
We propose a variational approach that by design seeks to approximate the true posterior distribution.
arXiv Detail & Related papers (2023-05-07T23:00:47Z) - Representing Noisy Image Without Denoising [91.73819173191076]
Fractional-order Moments in Radon space (FMR) is designed to derive robust representation directly from noisy images.
Unlike earlier integer-order methods, our work is a more generic design taking such classical methods as special cases.
arXiv Detail & Related papers (2023-01-18T10:13:29Z) - Embedding Propagation: Smoother Manifold for Few-Shot Classification [131.81692677836202]
We propose to use embedding propagation as an unsupervised non-parametric regularizer for manifold smoothing in few-shot classification.
We empirically show that embedding propagation yields a smoother embedding manifold.
We show that embedding propagation consistently improves the accuracy of the models in multiple semi-supervised learning scenarios by up to 16% points.
arXiv Detail & Related papers (2020-03-09T13:51:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.