Latent Harmony: Synergistic Unified UHD Image Restoration via Latent Space Regularization and Controllable Refinement
- URL: http://arxiv.org/abs/2510.07961v3
- Date: Fri, 24 Oct 2025 05:01:58 GMT
- Title: Latent Harmony: Synergistic Unified UHD Image Restoration via Latent Space Regularization and Controllable Refinement
- Authors: Yidi Liu, Xueyang Fu, Jie Huang, Jie Xiao, Dong Li, Wenlong Zhang, Lei Bai, Zheng-Jun Zha,
- Abstract summary: We introduce LH-VAE, which enhances semantic robustness through visual semantic constraints and progressive degradations.<n>Latent Harmony is a two-stage framework that redefines VAEs for UHD restoration by jointly regularizing the latent space and enforcing high-frequency-aware reconstruction.<n>Experiments show Latent Harmony achieves state-of-the-art performance across UHD and standard-resolution tasks, effectively balancing efficiency, perceptual quality, and reconstruction accuracy.
- Score: 89.99237142387655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ultra-High Definition (UHD) image restoration faces a trade-off between computational efficiency and high-frequency detail retention. While Variational Autoencoders (VAEs) improve efficiency via latent-space processing, their Gaussian constraint often discards degradation-specific high-frequency information, hurting reconstruction fidelity. To overcome this, we propose Latent Harmony, a two-stage framework that redefines VAEs for UHD restoration by jointly regularizing the latent space and enforcing high-frequency-aware reconstruction.In Stage One, we introduce LH-VAE, which enhances semantic robustness through visual semantic constraints and progressive degradation perturbations, while latent equivariance strengthens high-frequency reconstruction.Stage Two jointly trains this refined VAE with a restoration model using High-Frequency Low-Rank Adaptation (HF-LoRA): an encoder LoRA guided by a fidelity-oriented high-frequency alignment loss to recover authentic details, and a decoder LoRA driven by a perception-oriented loss to synthesize realistic textures. Both LoRA modules are trained via alternating optimization with selective gradient propagation to preserve the pretrained latent structure.At inference, a tunable parameter {\alpha} enables flexible fidelity-perception trade-offs.Experiments show Latent Harmony achieves state-of-the-art performance across UHD and standard-resolution tasks, effectively balancing efficiency, perceptual quality, and reconstruction accuracy.
Related papers
- FiDeSR: High-Fidelity and Detail-Preserving One-Step Diffusion Super-Resolution [11.03986460753769]
We propose FiDeSR, a high-fidelity and detail-preserving one-step diffusion super-resolution framework.<n>During training, we introduce a detail-aware weighting strategy that adaptively emphasizes regions where the model exhibits higher prediction errors.<n>During inference, low- and high-frequency adaptive enhancers further refine the reconstruction without requiring model retraining.
arXiv Detail & Related papers (2026-03-03T07:34:49Z) - AlignVAR: Towards Globally Consistent Visual Autoregression for Image Super-Resolution [16.90182090355781]
Visual autoregressive models offer stable training, non-iterative inference, and high-fidelity synthesis through next-scale prediction.<n>But their application remains underexplored and faces two critical challenges: locality-biased attention, and residual-only supervision.<n>We propose a globally consistent visual autoregressive framework tailored for image super-resolution.
arXiv Detail & Related papers (2026-02-28T10:39:06Z) - HSI-VAR: Rethinking Hyperspectral Restoration through Spatial-Spectral Visual Autoregression [43.90363193188088]
Hyperspectral images (HSIs) capture richer spatial-spectral information beyond RGB.<n>Real-world HSIs often suffer from a composite mix of degradations, such as noise, blur, and missing bands.
arXiv Detail & Related papers (2026-01-31T14:30:05Z) - Iterative Inference-time Scaling with Adaptive Frequency Steering for Image Super-Resolution [75.3690742776891]
We propose Iterative Diffusion Inference-Time Scaling with Adaptive Frequency Steering (IAFS)<n>IAFS addresses the challenge of balancing perceptual quality and structural fidelity by progressively refining the generated image through iterative correction of structural deviations.<n>Experiments show that IAFS effectively resolves the perception-fidelity conflict, yielding consistently improved perceptual detail and structural accuracy, and outperforming existing inference-time scaling methods.
arXiv Detail & Related papers (2025-12-29T15:09:20Z) - Progressive Flow-inspired Unfolding for Spectral Compressive Imaging [11.638690628451647]
Coded aperture snapshot spectral imaging (CASSI) retrieves a 3D hyperspectral image (HSI) from a single 2D compressed measurement.<n>Recent deep unfolding networks (DUNs) have achieved the state of the art in CASSI reconstruction.<n>Inspired by diffusion trajectories and flow matching, we propose a novel trajectory-controllable unfolding framework.
arXiv Detail & Related papers (2025-09-15T16:10:50Z) - Latent Wavelet Diffusion For Ultra-High-Resolution Image Synthesis [56.311477476580926]
We present Latent Wavelet Diffusion (LWD), a lightweight training framework that significantly improves detail and texture fidelity in ultra-high-resolution (2K-4K) image synthesis.<n>LWD introduces a novel, frequency-aware masking strategy derived from wavelet energy maps, which dynamically focuses the training process on detail-rich regions of the latent space.
arXiv Detail & Related papers (2025-05-31T07:28:32Z) - LAFR: Efficient Diffusion-based Blind Face Restoration via Latent Codebook Alignment Adapter [52.93785843453579]
Blind face restoration from low-quality (LQ) images is a challenging task that requires high-fidelity image reconstruction and the preservation of facial identity.<n>We propose LAFR, a novel codebook-based latent space adapter that aligns the latent distribution of LQ images with that of HQ counterparts.<n>We show that lightweight finetuning of diffusion prior on just 0.9% of FFHQ dataset is sufficient to achieve results comparable to state-of-the-art methods.
arXiv Detail & Related papers (2025-05-29T14:11:16Z) - RestoreVAR: Visual Autoregressive Generation for All-in-One Image Restoration [51.77917733024544]
latent diffusion models (LDMs) have improved the perceptual quality of All-in-One image Restoration (AiOR) methods.<n>LDMs suffer from slow inference due to their iterative denoising process, rendering them impractical for time-sensitive applications.<n>Visual autoregressive modeling ( VAR) performs scale-space autoregression and achieves comparable performance to that of state-of-the-art diffusion transformers.
arXiv Detail & Related papers (2025-05-23T15:52:26Z) - From Zero to Detail: Deconstructing Ultra-High-Definition Image Restoration from Progressive Spectral Perspective [38.94762644896294]
ERR comprises three collaborative sub-networks: the zero-frequency enhancer (ZFE), the low-frequency restorer (LFR), and the high-frequency refiner (HFR)<n>Specifically, the ZFE integrates global priors to learn global mapping, while the LFR restores low-frequency information, emphasizing reconstruction of coarse-grained content.<n>The HFR employs our designed frequency-windowed kolmogorov-arnold networks (FW-KAN) to refine textures and details, producing high-quality image restoration.
arXiv Detail & Related papers (2025-03-17T13:39:51Z) - Reconstruct-and-Generate Diffusion Model for Detail-Preserving Image
Denoising [16.43285056788183]
We propose a novel approach called the Reconstruct-and-Generate Diffusion Model (RnG)
Our method leverages a reconstructive denoising network to recover the majority of the underlying clean signal.
It employs a diffusion algorithm to generate residual high-frequency details, thereby enhancing visual quality.
arXiv Detail & Related papers (2023-09-19T16:01:20Z) - HDNet: High-resolution Dual-domain Learning for Spectral Compressive
Imaging [138.04956118993934]
We propose a high-resolution dual-domain learning network (HDNet) for HSI reconstruction.
On the one hand, the proposed HR spatial-spectral attention module with its efficient feature fusion provides continuous and fine pixel-level features.
On the other hand, frequency domain learning (FDL) is introduced for HSI reconstruction to narrow the frequency domain discrepancy.
arXiv Detail & Related papers (2022-03-04T06:37:45Z) - Fourier Space Losses for Efficient Perceptual Image Super-Resolution [131.50099891772598]
We show that it is possible to improve the performance of a recently introduced efficient generator architecture solely with the application of our proposed loss functions.
We show that our losses' direct emphasis on the frequencies in Fourier-space significantly boosts the perceptual image quality.
The trained generator achieves comparable results with and is 2.4x and 48x faster than state-of-the-art perceptual SR methods RankSRGAN and SRFlow respectively.
arXiv Detail & Related papers (2021-06-01T20:34:52Z) - Non-local Meets Global: An Iterative Paradigm for Hyperspectral Image
Restoration [66.68541690283068]
We propose a unified paradigm combining the spatial and spectral properties for hyperspectral image restoration.
The proposed paradigm enjoys performance superiority from the non-local spatial denoising and light computation complexity.
Experiments on HSI denoising, compressed reconstruction, and inpainting tasks, with both simulated and real datasets, demonstrate its superiority.
arXiv Detail & Related papers (2020-10-24T15:53:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.