Related papers: Toward Diffusible High-Dimensional Latent Spaces: A Frequency Perspective

Toward Diffusible High-Dimensional Latent Spaces: A Frequency Perspective

URL: http://arxiv.org/abs/2511.22249v1
Date: Thu, 27 Nov 2025 09:20:36 GMT
Title: Toward Diffusible High-Dimensional Latent Spaces: A Frequency Perspective
Authors: Bolin Lai, Xudong Wang, Saketh Rambhatla, James M. Rehg, Zsolt Kira, Rohit Girdhar, Ishan Misra,
Abstract summary: We analyze encoder/decoder behaviors and find that decoders depend strongly on high-frequency latent components to recover details.<n>We introduce FreqWarm, a plug-and-play frequency warm-up curriculum that increases early-stage exposure to high-frequency latent signals.
Score: 73.86108756585857
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Latent diffusion has become the default paradigm for visual generation, yet we observe a persistent reconstruction-generation trade-off as latent dimensionality increases: higher-capacity autoencoders improve reconstruction fidelity but generation quality eventually declines. We trace this gap to the different behaviors in high-frequency encoding and decoding. Through controlled perturbations in both RGB and latent domains, we analyze encoder/decoder behaviors and find that decoders depend strongly on high-frequency latent components to recover details, whereas encoders under-represent high-frequency contents, yielding insufficient exposure and underfitting in high-frequency bands for diffusion model training. To address this issue, we introduce FreqWarm, a plug-and-play frequency warm-up curriculum that increases early-stage exposure to high-frequency latent signals during diffusion or flow-matching training -- without modifying or retraining the autoencoder. Applied across several high-dimensional autoencoders, FreqWarm consistently improves generation quality: decreasing gFID by 14.11 on Wan2.2-VAE, 6.13 on LTX-VAE, and 4.42 on DC-AE-f32, while remaining architecture-agnostic and compatible with diverse backbones. Our study shows that explicitly managing frequency exposure can successfully turn high-dimensional latent spaces into more diffusible targets.

Related papers

Improving Reconstruction of Representation Autoencoder [52.817427902597416]
We propose LV-RAE, a representation autoencoder that augments semantic features with missing low-level information.<n>Our experiments demonstrate that LV-RAE significantly improves reconstruction fidelity while preserving the semantic abstraction.
arXiv Detail & Related papers (2026-02-09T13:12:35Z)
DuFal: Dual-Frequency-Aware Learning for High-Fidelity Extremely Sparse-view CBCT Reconstruction [9.883167817281313]
Sparse-view Cone-Beam Computed Tomography reconstruction from limited X-ray projections remains a challenging problem in medical imaging.<n>This paper presents DuFal, a novel framework that integrates frequency-domain and spatial-domain processing via a dual-path architecture.<n> Experimental results on the LUNA16 and ToothFairy datasets demonstrate that DuFal significantly outperforms existing state-of-the-art methods in preserving high-frequency anatomical features.
arXiv Detail & Related papers (2026-01-21T19:27:47Z)
SONAR: Spectral-Contrastive Audio Residuals for Generalizable Deepfake Detection [6.042897432654865]
Spectral-cONtrastive Audio Residuals (AR) is a frequency-guided framework for deepfake audio detectors.<n>AR disentangles an audio signal into complementary representations.<n> evaluated on the ASVspoof 2021 and in-the-wild benchmarks.
arXiv Detail & Related papers (2025-11-26T12:16:38Z)
FLaTEC: Frequency-Disentangled Latent Triplanes for Efficient Compression of LiDAR Point Clouds [52.997038111673966]
FLaTEC is a frequency-aware compression model that enables the compression of a full scan with high compression ratios.<n>We convert voxelized embeddings into triplane representations to reduce sparsity, computational cost, and storage requirements.<n>Our method achieves state-of-the-art rate-distortion performance and outperforms the standard codecs by 78% and 94% in BD-rate on both datasets.
arXiv Detail & Related papers (2025-11-25T08:37:49Z)
Latent Harmony: Synergistic Unified UHD Image Restoration via Latent Space Regularization and Controllable Refinement [89.99237142387655]
We introduce LH-VAE, which enhances semantic robustness through visual semantic constraints and progressive degradations.<n>Latent Harmony is a two-stage framework that redefines VAEs for UHD restoration by jointly regularizing the latent space and enforcing high-frequency-aware reconstruction.<n>Experiments show Latent Harmony achieves state-of-the-art performance across UHD and standard-resolution tasks, effectively balancing efficiency, perceptual quality, and reconstruction accuracy.
arXiv Detail & Related papers (2025-10-09T08:54:26Z)
DiffPR: Diffusion-Based Phase Reconstruction via Frequency-Decoupled Learning [4.560284382063488]
Oversmoothing remains a persistent problem when applying deep learning to off-axis quantitative phase imaging (QPI)<n>We trace this issue to spectral bias and show that the bias is reinforced by high-level skip connections.<n>We introduce DiffPR, a two-stage frequency-decoupled framework.
arXiv Detail & Related papers (2025-06-12T17:08:45Z)
Improving the Diffusability of Autoencoders [54.920783089085035]
Latent diffusion models have emerged as the leading approach for generating high-quality images and videos.<n>We perform a spectral analysis of modern autoencoders and identify inordinate high-frequency components in their latent spaces.<n>We hypothesize that this high-frequency component interferes with the coarse-to-fine nature of the diffusion synthesis process and hinders the generation quality.
arXiv Detail & Related papers (2025-02-20T18:45:44Z)
High-Frequency Enhanced Hybrid Neural Representation for Video Compression [32.38933743785333]
This paper introduces a High-Frequency Enhanced Hybrid Neural Representation Network.<n>Our method focuses on leveraging high-frequency information to improve the synthesis of fine details by the network.<n> Experiments on the Bunny and UVG datasets demonstrate that our method outperforms other methods.
arXiv Detail & Related papers (2024-11-11T03:04:46Z)
Denoising Diffusion Error Correction Codes [92.10654749898927]
Recently, neural decoders have demonstrated their advantage over classical decoding techniques. Recent state-of-the-art neural decoders suffer from high complexity and lack the important iterative scheme characteristic of many legacy decoders. We propose to employ denoising diffusion models for the soft decoding of linear codes at arbitrary block lengths.
arXiv Detail & Related papers (2022-09-16T11:00:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.