Related papers: Missing Fine Details in Images: Last Seen in High Frequencies

Missing Fine Details in Images: Last Seen in High Frequencies

URL: http://arxiv.org/abs/2509.05441v3
Date: Wed, 10 Sep 2025 22:15:25 GMT
Title: Missing Fine Details in Images: Last Seen in High Frequencies
Authors: Tejaswini Medi, Hsien-Yi Wang, Arianna Rampini, Margret Keuper,
Abstract summary: We propose a wavelet-based, frequency-aware variational autoencoder (FA-VAE) framework that explicitly decouples the optimization of low- and high-frequency components.<n>Our approach bridges the fidelity gap in current latent tokenizers and emphasizes the importance of frequency-aware optimization for realistic image synthesis.
Score: 17.95197409468585
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Latent generative models have shown remarkable progress in high-fidelity image synthesis, typically using a two-stage training process that involves compressing images into latent embeddings via learned tokenizers in the first stage. The quality of generation strongly depends on how expressive and well-optimized these latent embeddings are. While various methods have been proposed to learn effective latent representations, generated images often lack realism, particularly in textured regions with sharp transitions, due to loss of fine details governed by high frequencies. We conduct a detailed frequency decomposition of existing state-of-the-art (SOTA) latent tokenizers and show that conventional objectives inherently prioritize low-frequency reconstruction, often at the expense of high-frequency fidelity. Our analysis reveals these latent tokenizers exhibit a bias toward low-frequency information during optimization, leading to over-smoothed outputs and visual artifacts that diminish perceptual quality. To address this, we propose a wavelet-based, frequency-aware variational autoencoder (FA-VAE) framework that explicitly decouples the optimization of low- and high-frequency components. This decoupling enables improved reconstruction of fine textures while preserving global structure. Moreover, we integrate our frequency-preserving latent embeddings into a SOTA latent diffusion model, resulting in sharper and more realistic image generation. Our approach bridges the fidelity gap in current latent tokenizers and emphasizes the importance of frequency-aware optimization for realistic image synthesis, with broader implications for applications in content creation, neural rendering, and medical imaging.

Related papers

SFTok: Bridging the Performance Gap in Discrete Tokenizers [72.9996757048065]
We propose textbfSFTok, a discrete tokenizer that incorporates a multi-step iterative mechanism for precise reconstruction.<n>At a high compression rate of only 64 tokens per image, SFTok achieves state-of-the-art reconstruction quality on ImageNet.
arXiv Detail & Related papers (2025-12-18T18:59:04Z)
Towards Frequency-Adaptive Learning for SAR Despeckling [10.764049665817629]
We propose a frequency-adaptive heterogeneous despeckling model based on a divide-and-conquer architecture.<n>Inspired by their differing noise characteristics, we design specialized sub-networks for different frequency components.<n>For high-frequency sub-bands rich in edges and textures, we introduce an enhanced U-Net with deformable convolutions for noise suppression and enhanced features.
arXiv Detail & Related papers (2025-11-08T07:08:22Z)
WaveSeg: Enhancing Segmentation Precision via High-Frequency Prior and Mamba-Driven Spectrum Decomposition [61.3530659856013]
We propose a novel decoder architecture, WaveSeg, which jointly optimize feature refinement in spatial and wavelet domains.<n>High-frequency components are first learned from input images as explicit priors to reinforce boundary details.<n>Experiments on standard benchmarks demonstrate that WaveSeg, leveraging wavelet-domain frequency prior with Mamba-based attention, consistently outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2025-10-24T01:41:31Z)
Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis [40.93077975823353]
Visual autoregressive modeling, based on the next-scale prediction paradigm, exhibits notable advantages in image quality and model scalability.<n>However, the computational overhead in high-resolution stages remains a critical challenge due to the substantial number of tokens involved.<n>We introduce Sparsevar, a plug-and-play acceleration framework for next-scale prediction that dynamically excludes low-frequency tokens during inference without requiring additional training.
arXiv Detail & Related papers (2025-07-28T01:13:24Z)
HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling [1.9474278832087901]
HiWave is a training-free, zero-shot approach that substantially enhances visual fidelity and structural coherence in ultra-high-resolution image synthesis.<n>A user study confirmed HiWave's performance, where it was preferred over the state-of-the-art alternative in more than 80% of comparisons.
arXiv Detail & Related papers (2025-06-25T13:58:37Z)
Frequency-Domain Fusion Transformer for Image Inpainting [6.4194162137514725]
This paper proposes a Transformer-based image inpainting method incorporating frequency-domain fusion.<n> Experimental results demonstrate that the proposed method effectively improves the quality of image inpainting by preserving more high-frequency information.
arXiv Detail & Related papers (2025-06-23T09:19:04Z)
Learning Multi-scale Spatial-frequency Features for Image Denoising [58.883244886588336]
We propose a novel multi-scale adaptive dual-domain network (MADNet) for image denoising.<n>We use image pyramid inputs to restore noise-free results from low-resolution images.<n>In order to realize the interaction of high-frequency and low-frequency information, we design an adaptive spatial-frequency learning unit.
arXiv Detail & Related papers (2025-06-19T13:28:09Z)
Freqformer: Image-Demoiréing Transformer via Efficient Frequency Decomposition [83.40450475728792]
We present Freqformer, a Transformer-based framework specifically designed for image demoir'eing through targeted frequency separation.<n>Our method performs an effective frequency decomposition that explicitly splits moir'e patterns into high-frequency spatially-localized textures and low-frequency scale-robust color distortions.<n>Experiments on various demoir'eing benchmarks demonstrate that Freqformer achieves state-of-the-art performance with a compact model size.
arXiv Detail & Related papers (2025-05-25T12:23:10Z)
Leveraging Diffusion Knowledge for Generative Image Compression with Fractal Frequency-Aware Band Learning [16.077768397480902]
generative image compression approaches produce detailed, realistic images instead of the only sharp-looking reconstructions.<n>We propose a novel deep learning-based generative image compression method injected with diffusion knowledge.<n>Our method achieves better distortions at high realism and better realism at low distortion than ever before.
arXiv Detail & Related papers (2025-03-14T11:41:33Z)
"Principal Components" Enable A New Language of Images [79.45806370905775]
We introduce a novel visual tokenization framework that embeds a provable PCA-like structure into the latent token space.<n>Our approach achieves state-of-the-art reconstruction performance and enables better interpretability to align with the human vision system.
arXiv Detail & Related papers (2025-03-11T17:59:41Z)
NFIG: Autoregressive Image Generation with Next-Frequency Prediction [50.69346038028673]
We present textbfNext-textbfFrequency textbfImage textbfGeneration (textbfNFIG), a novel framework that decomposes the image generation process into multiple frequency-guided stages.<n>Our approach first generates low-frequency components to establish global structure with fewer tokens, then progressively adds higher-frequency details, following the natural spectral hierarchy of images.
arXiv Detail & Related papers (2025-03-10T08:59:10Z)
MSF: Efficient Diffusion Model Via Multi-Scale Latent Factorize [18.73205699076486]
We introduce a diffusion framework leveraging multi-scale latent factorization.<n>Our framework decomposes the denoising target, typically latent features from a pretrained Variational Autoencoder, into a low-frequency base signal.<n>Our proposed architecture facilitates reduced sampling steps during the residual learning stage.
arXiv Detail & Related papers (2025-01-23T03:18:23Z)
FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process [120.91393949012014]
FreeEnhance is a framework for content-consistent image enhancement using off-the-shelf image diffusion models. In the noising stage, FreeEnhance is devised to add lighter noise to the region with higher frequency to preserve the high-frequent patterns in the original image. In the denoising stage, we present three target properties as constraints to regularize the predicted noise, enhancing images with high acutance and high visual quality.
arXiv Detail & Related papers (2024-09-11T17:58:50Z)
Inverting Adversarially Robust Networks for Image Synthesis [37.927552662984034]
We propose the use of robust representations as a perceptual primitive for feature inversion models. We empirically show that adopting robust representations as an image prior significantly improves the reconstruction accuracy of CNN-based feature inversion models. Following these findings, we propose an encoding-decoding network based on robust representations and show its advantages for applications such as anomaly detection, style transfer and image denoising.
arXiv Detail & Related papers (2021-06-13T05:51:00Z)
Focal Frequency Loss for Image Reconstruction and Synthesis [125.7135706352493]
We show that narrowing gaps in the frequency domain can ameliorate image reconstruction and synthesis quality further. We propose a novel focal frequency loss, which allows a model to adaptively focus on frequency components that are hard to synthesize.
arXiv Detail & Related papers (2020-12-23T17:32:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.