Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent Space
- URL: http://arxiv.org/abs/2503.09419v1
- Date: Wed, 12 Mar 2025 14:16:30 GMT
- Title: Alias-Free Latent Diffusion Models:Improving Fractional Shift Equivariance of Diffusion Latent Space
- Authors: Yifan Zhou, Zeqi Xiao, Shuai Yang, Xingang Pan,
- Abstract summary: Latent Diffusion Models (LDMs) are known to have an unstable generation process.<n>This hinders their applicability in applications requiring consistent results.<n>In this work, we redesign LDMs to enhance consistency by making them shift-equivariant.
- Score: 20.361790608772157
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Latent Diffusion Models (LDMs) are known to have an unstable generation process, where even small perturbations or shifts in the input noise can lead to significantly different outputs. This hinders their applicability in applications requiring consistent results. In this work, we redesign LDMs to enhance consistency by making them shift-equivariant. While introducing anti-aliasing operations can partially improve shift-equivariance, significant aliasing and inconsistency persist due to the unique challenges in LDMs, including 1) aliasing amplification during VAE training and multiple U-Net inferences, and 2) self-attention modules that inherently lack shift-equivariance. To address these issues, we redesign the attention modules to be shift-equivariant and propose an equivariance loss that effectively suppresses the frequency bandwidth of the features in the continuous domain. The resulting alias-free LDM (AF-LDM) achieves strong shift-equivariance and is also robust to irregular warping. Extensive experiments demonstrate that AF-LDM produces significantly more consistent results than vanilla LDM across various applications, including video editing and image-to-image translation. Code is available at: https://github.com/SingleZombie/AFLDM
Related papers
- Learning from Heterogeneity: Generalizing Dynamic Facial Expression Recognition via Distributionally Robust Optimization [23.328511708942045]
Heterogeneity-aware Distributional Framework (HDF) designed to enhance time-frequency modeling and mitigate imbalance caused by hard samples.<n>Time-Frequency Distributional Attention Module (DAM) captures both temporal consistency and frequency robustness.<n> adaptive optimization module Distribution-aware Scaling Module (DSM) introduced to dynamically balance classification and contrastive losses.
arXiv Detail & Related papers (2025-07-21T16:21:47Z) - Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling [48.96034602889216]
Variencoding Discrete Diffusion (VADD) is a novel framework that enhances discrete diffusion with latent variable modeling.<n>By introducing an auxiliary recognition model, VADD enables stable training via variational lower bounds and amortized inference over the training set.<n> Empirical results on 2D toy data, pixel-level image generation, and text generation demonstrate that VADD consistently outperforms MDM baselines.
arXiv Detail & Related papers (2025-05-23T01:45:47Z) - FreSca: Scaling in Frequency Space Enhances Diffusion Models [55.75504192166779]
This paper explores frequency-based control within latent diffusion models.<n>We introduce FreSca, a novel framework that decomposes noise difference into low- and high-frequency components.<n>FreSca operates without any model retraining or architectural change, offering model- and task-agnostic control.
arXiv Detail & Related papers (2025-04-02T22:03:11Z) - FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion [63.609399000712905]
Inference at a scaled resolution leads to repetitive patterns and structural distortions.
We propose two simple modules that combine to solve these issues.
Our method, coined Fam diffusion, can seamlessly integrate into any latent diffusion model and requires no additional training.
arXiv Detail & Related papers (2024-11-27T17:51:44Z) - DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents [41.86208391836456]
We propose DisCo-Diff to simplify encoding a complex data distribution into a single continuous Gaussian distribution.
DisCo-Diff does not rely on pre-trained networks, making the framework universally applicable.
We validate DisCo-Diff on toy data, several image synthesis tasks as well as molecular docking, and find that introducing discrete latents consistently improves model performance.
arXiv Detail & Related papers (2024-07-03T17:42:46Z) - Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models [82.8261101680427]
Smooth latent spaces ensure that a perturbation on an input latent corresponds to a steady change in the output image.
This property proves beneficial in downstream tasks, including image inversion, inversion, and editing.
We propose Smooth Diffusion, a new category of diffusion models that can be simultaneously high-performing and smooth.
arXiv Detail & Related papers (2023-12-07T16:26:23Z) - One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion
Schedule Flaws and Enhancing Low-Frequency Controls [77.42510898755037]
One More Step (OMS) is a compact network that incorporates an additional simple yet effective step during inference.
OMS elevates image fidelity and harmonizes the dichotomy between training and inference, while preserving original model parameters.
Once trained, various pre-trained diffusion models with the same latent domain can share the same OMS module.
arXiv Detail & Related papers (2023-11-27T12:02:42Z) - Mitigating Shortcut Learning with Diffusion Counterfactuals and Diverse Ensembles [95.49699178874683]
We propose DiffDiv, an ensemble diversification framework exploiting Diffusion Probabilistic Models (DPMs)
We show that DPMs can generate images with novel feature combinations, even when trained on samples displaying correlated input features.
We show that DPM-guided diversification is sufficient to remove dependence on shortcut cues, without a need for additional supervised signals.
arXiv Detail & Related papers (2023-11-23T15:47:33Z) - Neural Diffusion Models [2.1779479916071067]
We present a generalization of conventional diffusion models that enables defining and learning time-dependent non-linear transformations of data.
NDMs outperform conventional diffusion models in terms of likelihood and produce high-quality samples.
arXiv Detail & Related papers (2023-10-12T13:54:55Z) - Semi-Implicit Denoising Diffusion Models (SIDDMs) [50.30163684539586]
Existing models such as Denoising Diffusion Probabilistic Models (DDPM) deliver high-quality, diverse samples but are slowed by an inherently high number of iterative steps.
We introduce a novel approach that tackles the problem by matching implicit and explicit factors.
We demonstrate that our proposed method obtains comparable generative performance to diffusion-based models and vastly superior results to models with a small number of sampling steps.
arXiv Detail & Related papers (2023-06-21T18:49:22Z) - f-DM: A Multi-stage Diffusion Model via Progressive Signal
Transformation [56.04628143914542]
Diffusion models (DMs) have recently emerged as SoTA tools for generative modeling in various domains.
We propose f-DM, a generalized family of DMs which allows progressive signal transformation.
We apply f-DM in image generation tasks with a range of functions, including down-sampling, blurring, and learned transformations.
arXiv Detail & Related papers (2022-10-10T18:49:25Z) - Detaching and Boosting: Dual Engine for Scale-Invariant Self-Supervised
Monocular Depth Estimation [18.741426143836538]
We present a scale-invariant approach for self-supervised MDE, in which scale-sensitive features (SSFs) are detached away.
To be specific, a simple but effective data augmentation by imitating the camera zooming process is proposed to detach SSFs.
Our approach achieves new State-of-The-Art performance against existing works from 0.097 to 0.090 w.r.t absolute relative error.
arXiv Detail & Related papers (2022-10-08T07:38:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.