Latent Wavelet Diffusion: Enabling 4K Image Synthesis for Free
- URL: http://arxiv.org/abs/2506.00433v2
- Date: Tue, 03 Jun 2025 04:38:10 GMT
- Title: Latent Wavelet Diffusion: Enabling 4K Image Synthesis for Free
- Authors: Luigi Sigillo, Shengfeng He, Danilo Comminiello,
- Abstract summary: Latent Wavelet Diffusion (LWD) is a lightweight framework that enables any latent diffusion model to scale to ultra-high-resolution image generation (2K to 4K) for free.<n>LWD requires no architectural modifications and incurs no additional computational overhead.<n>It consistently improves perceptual quality and reduces FID in ultra-high-resolution image synthesis, outperforming strong baseline models.
- Score: 31.515710473565925
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High-resolution image synthesis remains a core challenge in generative modeling, particularly in balancing computational efficiency with the preservation of fine-grained visual detail. We present Latent Wavelet Diffusion (LWD), a lightweight framework that enables any latent diffusion model to scale to ultra-high-resolution image generation (2K to 4K) for free. LWD introduces three key components: (1) a scale-consistent variational autoencoder objective that enhances the spectral fidelity of latent representations; (2) wavelet energy maps that identify and localize detail-rich spatial regions within the latent space; and (3) a time-dependent masking strategy that focuses denoising supervision on high-frequency components during training. LWD requires no architectural modifications and incurs no additional computational overhead. Despite its simplicity, it consistently improves perceptual quality and reduces FID in ultra-high-resolution image synthesis, outperforming strong baseline models. These results highlight the effectiveness of frequency-aware, signal-driven supervision as a principled and efficient approach for high-resolution generative modeling.
Related papers
- Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention [54.15345846343084]
We propose Ultra3D, an efficient 3D generation framework that significantly accelerates sparse voxel modeling without compromising quality.<n>Part Attention is a geometry-aware localized attention mechanism that restricts attention computation within semantically consistent part regions.<n>Experiments demonstrate that Ultra3D supports high-resolution 3D generation at 1024 resolution and achieves state-of-the-art performance in both visual fidelity and user preference.
arXiv Detail & Related papers (2025-07-23T17:57:16Z) - HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling [1.9474278832087901]
HiWave is a training-free, zero-shot approach that substantially enhances visual fidelity and structural coherence in ultra-high-resolution image synthesis.<n>A user study confirmed HiWave's performance, where it was preferred over the state-of-the-art alternative in more than 80% of comparisons.
arXiv Detail & Related papers (2025-06-25T13:58:37Z) - FADPNet: Frequency-Aware Dual-Path Network for Face Super-Resolution [70.61549422952193]
Face super-resolution (FSR) under limited computational costs remains an open problem.<n>Existing approaches typically treat all facial pixels equally, resulting in suboptimal allocation of computational resources.<n>We propose FADPNet, a Frequency-Aware Dual-Path Network that decomposes facial features into low- and high-frequency components.
arXiv Detail & Related papers (2025-06-17T02:33:42Z) - V2V3D: View-to-View Denoised 3D Reconstruction for Light-Field Microscopy [12.356249860549472]
Light field microscopy (LFM) has gained significant attention due to its ability to capture snapshot-based, large-scale 3D fluorescence images.<n>Existing LFM reconstruction algorithms are highly sensitive to sensor noise or require hard-to-get ground-truth annotated data for training.<n>This paper introduces V2V3D, an unsupervised view2view-based framework that establishes a new paradigm for joint optimization of image denoising and 3D reconstruction.
arXiv Detail & Related papers (2025-04-10T15:29:26Z) - FreSca: Scaling in Frequency Space Enhances Diffusion Models [55.75504192166779]
This paper explores frequency-based control within latent diffusion models.<n>We introduce FreSca, a novel framework that decomposes noise difference into low- and high-frequency components.<n>FreSca operates without any model retraining or architectural change, offering model- and task-agnostic control.
arXiv Detail & Related papers (2025-04-02T22:03:11Z) - FUSE: Label-Free Image-Event Joint Monocular Depth Estimation via Frequency-Decoupled Alignment and Degradation-Robust Fusion [63.87313550399871]
Image-event joint depth estimation methods leverage complementary modalities for robust perception, yet face challenges in generalizability.<n>We propose Self-supervised Transfer (PST) and FrequencyDe-coupled Fusion module (FreDF)<n>PST establishes cross-modal knowledge transfer through latent space alignment with image foundation models.<n>FreDF explicitly decouples high-frequency edge features from low-frequency structural components, resolving modality-specific frequency mismatches.
arXiv Detail & Related papers (2025-03-25T15:04:53Z) - MSF: Efficient Diffusion Model Via Multi-Scale Latent Factorize [18.73205699076486]
We introduce a diffusion framework leveraging multi-scale latent factorization.<n>Our framework decomposes the denoising target, typically latent features from a pretrained Variational Autoencoder, into a low-frequency base signal.<n>Our proposed architecture facilitates reduced sampling steps during the residual learning stage.
arXiv Detail & Related papers (2025-01-23T03:18:23Z) - Diffusion Models Without Attention [110.5623058129782]
Diffusion State Space Model (DiffuSSM) is an architecture that supplants attention mechanisms with a more scalable state space model backbone.
Our focus on FLOP-efficient architectures in diffusion training marks a significant step forward.
arXiv Detail & Related papers (2023-11-30T05:15:35Z) - HDNet: High-resolution Dual-domain Learning for Spectral Compressive
Imaging [138.04956118993934]
We propose a high-resolution dual-domain learning network (HDNet) for HSI reconstruction.
On the one hand, the proposed HR spatial-spectral attention module with its efficient feature fusion provides continuous and fine pixel-level features.
On the other hand, frequency domain learning (FDL) is introduced for HSI reconstruction to narrow the frequency domain discrepancy.
arXiv Detail & Related papers (2022-03-04T06:37:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.