Related papers: Latent Harmony: Synergistic Unified UHD Image Restoration via Latent Space Regularization and Controllable Refinement

Latent Harmony: Synergistic Unified UHD Image Restoration via Latent Space Regularization and Controllable Refinement

URL: http://arxiv.org/abs/2510.07961v3
Date: Fri, 24 Oct 2025 05:01:58 GMT
Title: Latent Harmony: Synergistic Unified UHD Image Restoration via Latent Space Regularization and Controllable Refinement
Authors: Yidi Liu, Xueyang Fu, Jie Huang, Jie Xiao, Dong Li, Wenlong Zhang, Lei Bai, Zheng-Jun Zha,
Abstract summary: We introduce LH-VAE, which enhances semantic robustness through visual semantic constraints and progressive degradations.<n>Latent Harmony is a two-stage framework that redefines VAEs for UHD restoration by jointly regularizing the latent space and enforcing high-frequency-aware reconstruction.<n>Experiments show Latent Harmony achieves state-of-the-art performance across UHD and standard-resolution tasks, effectively balancing efficiency, perceptual quality, and reconstruction accuracy.
Score: 89.99237142387655
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Ultra-High Definition (UHD) image restoration faces a trade-off between computational efficiency and high-frequency detail retention. While Variational Autoencoders (VAEs) improve efficiency via latent-space processing, their Gaussian constraint often discards degradation-specific high-frequency information, hurting reconstruction fidelity. To overcome this, we propose Latent Harmony, a two-stage framework that redefines VAEs for UHD restoration by jointly regularizing the latent space and enforcing high-frequency-aware reconstruction.In Stage One, we introduce LH-VAE, which enhances semantic robustness through visual semantic constraints and progressive degradation perturbations, while latent equivariance strengthens high-frequency reconstruction.Stage Two jointly trains this refined VAE with a restoration model using High-Frequency Low-Rank Adaptation (HF-LoRA): an encoder LoRA guided by a fidelity-oriented high-frequency alignment loss to recover authentic details, and a decoder LoRA driven by a perception-oriented loss to synthesize realistic textures. Both LoRA modules are trained via alternating optimization with selective gradient propagation to preserve the pretrained latent structure.At inference, a tunable parameter {\alpha} enables flexible fidelity-perception trade-offs.Experiments show Latent Harmony achieves state-of-the-art performance across UHD and standard-resolution tasks, effectively balancing efficiency, perceptual quality, and reconstruction accuracy.

Related papers

FiDeSR: High-Fidelity and Detail-Preserving One-Step Diffusion Super-Resolution [11.03986460753769]
We propose FiDeSR, a high-fidelity and detail-preserving one-step diffusion super-resolution framework.<n>During training, we introduce a detail-aware weighting strategy that adaptively emphasizes regions where the model exhibits higher prediction errors.<n>During inference, low- and high-frequency adaptive enhancers further refine the reconstruction without requiring model retraining.
arXiv Detail & Related papers (2026-03-03T07:34:49Z)
AlignVAR: Towards Globally Consistent Visual Autoregression for Image Super-Resolution [16.90182090355781]
Visual autoregressive models offer stable training, non-iterative inference, and high-fidelity synthesis through next-scale prediction.<n>But their application remains underexplored and faces two critical challenges: locality-biased attention, and residual-only supervision.<n>We propose a globally consistent visual autoregressive framework tailored for image super-resolution.
arXiv Detail & Related papers (2026-02-28T10:39:06Z)
HSI-VAR: Rethinking Hyperspectral Restoration through Spatial-Spectral Visual Autoregression [43.90363193188088]
Hyperspectral images (HSIs) capture richer spatial-spectral information beyond RGB.<n>Real-world HSIs often suffer from a composite mix of degradations, such as noise, blur, and missing bands.
arXiv Detail & Related papers (2026-01-31T14:30:05Z)
Iterative Inference-time Scaling with Adaptive Frequency Steering for Image Super-Resolution [75.3690742776891]
We propose Iterative Diffusion Inference-Time Scaling with Adaptive Frequency Steering (IAFS)<n>IAFS addresses the challenge of balancing perceptual quality and structural fidelity by progressively refining the generated image through iterative correction of structural deviations.<n>Experiments show that IAFS effectively resolves the perception-fidelity conflict, yielding consistently improved perceptual detail and structural accuracy, and outperforming existing inference-time scaling methods.
arXiv Detail & Related papers (2025-12-29T15:09:20Z)
Progressive Flow-inspired Unfolding for Spectral Compressive Imaging [11.638690628451647]
Coded aperture snapshot spectral imaging (CASSI) retrieves a 3D hyperspectral image (HSI) from a single 2D compressed measurement.<n>Recent deep unfolding networks (DUNs) have achieved the state of the art in CASSI reconstruction.<n>Inspired by diffusion trajectories and flow matching, we propose a novel trajectory-controllable unfolding framework.
arXiv Detail & Related papers (2025-09-15T16:10:50Z)
Latent Wavelet Diffusion For Ultra-High-Resolution Image Synthesis [56.311477476580926]
We present Latent Wavelet Diffusion (LWD), a lightweight training framework that significantly improves detail and texture fidelity in ultra-high-resolution (2K-4K) image synthesis.<n>LWD introduces a novel, frequency-aware masking strategy derived from wavelet energy maps, which dynamically focuses the training process on detail-rich regions of the latent space.
arXiv Detail & Related papers (2025-05-31T07:28:32Z)
LAFR: Efficient Diffusion-based Blind Face Restoration via Latent Codebook Alignment Adapter [52.93785843453579]
Blind face restoration from low-quality (LQ) images is a challenging task that requires high-fidelity image reconstruction and the preservation of facial identity.<n>We propose LAFR, a novel codebook-based latent space adapter that aligns the latent distribution of LQ images with that of HQ counterparts.<n>We show that lightweight finetuning of diffusion prior on just 0.9% of FFHQ dataset is sufficient to achieve results comparable to state-of-the-art methods.
arXiv Detail & Related papers (2025-05-29T14:11:16Z)
RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration [51.77917733024544]
latent diffusion models (LDMs) have improved the perceptual quality of All-in-One image Restoration (AiOR) methods.<n>LDMs suffer from slow inference due to their iterative denoising process, rendering them impractical for time-sensitive applications.<n>Visual autoregressive modeling ( VAR) performs scale-space autoregression and achieves comparable performance to that of state-of-the-art diffusion transformers.
arXiv Detail & Related papers (2025-05-23T15:52:26Z)
From Zero to Detail: Deconstructing Ultra-High-Definition Image Restoration from Progressive Spectral Perspective [38.94762644896294]
ERR comprises three collaborative sub-networks: the zero-frequency enhancer (ZFE), the low-frequency restorer (LFR), and the high-frequency refiner (HFR)<n>Specifically, the ZFE integrates global priors to learn global mapping, while the LFR restores low-frequency information, emphasizing reconstruction of coarse-grained content.<n>The HFR employs our designed frequency-windowed kolmogorov-arnold networks (FW-KAN) to refine textures and details, producing high-quality image restoration.
arXiv Detail & Related papers (2025-03-17T13:39:51Z)
Reconstruct-and-Generate Diffusion Model for Detail-Preserving Image Denoising [16.43285056788183]
We propose a novel approach called the Reconstruct-and-Generate Diffusion Model (RnG) Our method leverages a reconstructive denoising network to recover the majority of the underlying clean signal. It employs a diffusion algorithm to generate residual high-frequency details, thereby enhancing visual quality.
arXiv Detail & Related papers (2023-09-19T16:01:20Z)
HDNet: High-resolution Dual-domain Learning for Spectral Compressive Imaging [138.04956118993934]
We propose a high-resolution dual-domain learning network (HDNet) for HSI reconstruction. On the one hand, the proposed HR spatial-spectral attention module with its efficient feature fusion provides continuous and fine pixel-level features. On the other hand, frequency domain learning (FDL) is introduced for HSI reconstruction to narrow the frequency domain discrepancy.
arXiv Detail & Related papers (2022-03-04T06:37:45Z)
Fourier Space Losses for Efficient Perceptual Image Super-Resolution [131.50099891772598]
We show that it is possible to improve the performance of a recently introduced efficient generator architecture solely with the application of our proposed loss functions. We show that our losses' direct emphasis on the frequencies in Fourier-space significantly boosts the perceptual image quality. The trained generator achieves comparable results with and is 2.4x and 48x faster than state-of-the-art perceptual SR methods RankSRGAN and SRFlow respectively.
arXiv Detail & Related papers (2021-06-01T20:34:52Z)
Non-local Meets Global: An Iterative Paradigm for Hyperspectral Image Restoration [66.68541690283068]
We propose a unified paradigm combining the spatial and spectral properties for hyperspectral image restoration. The proposed paradigm enjoys performance superiority from the non-local spatial denoising and light computation complexity. Experiments on HSI denoising, compressed reconstruction, and inpainting tasks, with both simulated and real datasets, demonstrate its superiority.
arXiv Detail & Related papers (2020-10-24T15:53:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.