LCUDiff: Latent Capacity Upgrade Diffusion for Faithful Human Body Restoration
- URL: http://arxiv.org/abs/2602.04406v1
- Date: Wed, 04 Feb 2026 10:37:46 GMT
- Title: LCUDiff: Latent Capacity Upgrade Diffusion for Faithful Human Body Restoration
- Authors: Jue Gong, Zihan Zhou, Jingkai Wang, Shu Li, Libo Liu, Jianliang Lan, Yulun Zhang,
- Abstract summary: Existing methods for restoring degraded human-centric images often struggle with insufficient fidelity.<n>We propose LCUDiff, a stable one-step framework that upgrades a pre-trained latent diffusion model.<n> Experiments on synthetic and real-world datasets show competitive results with higher fidelity and fewer artifacts.
- Score: 23.264518366939825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing methods for restoring degraded human-centric images often struggle with insufficient fidelity, particularly in human body restoration (HBR). Recent diffusion-based restoration methods commonly adapt pre-trained text-to-image diffusion models, where the variational autoencoder (VAE) can significantly bottleneck restoration fidelity. We propose LCUDiff, a stable one-step framework that upgrades a pre-trained latent diffusion model from the 4-channel latent space to the 16-channel latent space. For VAE fine-tuning, channel splitting distillation (CSD) is used to keep the first four channels aligned with pre-trained priors while allocating the additional channels to effectively encode high-frequency details. We further design prior-preserving adaptation (PPA) to smoothly bridge the mismatch between 4-channel diffusion backbones and the higher-dimensional 16-channel latent. In addition, we propose a decoder router (DeR) for per-sample decoder routing using restoration-quality score annotations, which improves visual quality across diverse conditions. Experiments on synthetic and real-world datasets show competitive results with higher fidelity and fewer artifacts under mild degradations, while preserving one-step efficiency. The code and model will be at https://github.com/gobunu/LCUDiff.
Related papers
- SSDD: Single-Step Diffusion Decoder for Efficient Image Tokenization [56.12853087022071]
We introduce a new pixel diffusion decoder architecture for improved scaling and training stability.<n>We use distillation to replicate the performance of the diffusion decoder in an efficient single-step decoder.<n>This makes SSDD the first diffusion decoder optimized for single-step reconstruction trained without adversarial losses.
arXiv Detail & Related papers (2025-10-06T15:57:31Z) - OSCAR: One-Step Diffusion Codec Across Multiple Bit-rates [39.746866725267516]
Pretrained latent diffusion models have shown strong potential for lossy image compression.<n>We propose a one-step diffusion across multiple bit-rates termed OSCAR.<n>Experiments demonstrate that OSCAR achieves superior performance in both quantitative and visual quality metrics.
arXiv Detail & Related papers (2025-05-22T00:14:12Z) - SING: Semantic Image Communications using Null-Space and INN-Guided Diffusion Models [52.40011613324083]
Joint source-channel coding systems (DeepJSCC) have recently demonstrated remarkable performance in wireless image transmission.<n>Existing methods focus on minimizing distortion between the transmitted image and the reconstructed version at the receiver, often overlooking perceptual quality.<n>We propose SING, a novel framework that formulates the recovery of high-quality images from corrupted reconstructions as an inverse problem.
arXiv Detail & Related papers (2025-03-16T12:32:11Z) - One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z) - Rethinking Video Tokenization: A Conditioned Diffusion-based Approach [58.164354605550194]
New tokenizer, Diffusion Conditioned-based Gene Tokenizer, replaces GAN-based decoder with conditional diffusion model.<n>We trained using only a basic MSE diffusion loss for reconstruction, along with KL term and LPIPS perceptual loss from scratch.<n>Even a scaled-down version of CDT (3$times inference speedup) still performs comparably with top baselines.
arXiv Detail & Related papers (2025-03-05T17:59:19Z) - DifFace: Blind Face Restoration with Diffused Error Contraction [62.476329680424975]
DifFace is capable of coping with unseen and complex degradations more gracefully without complicated loss designs.
It is superior to current state-of-the-art methods, especially in cases with severe degradations.
arXiv Detail & Related papers (2022-12-13T11:52:33Z) - Denoising Diffusion Error Correction Codes [92.10654749898927]
Recently, neural decoders have demonstrated their advantage over classical decoding techniques.
Recent state-of-the-art neural decoders suffer from high complexity and lack the important iterative scheme characteristic of many legacy decoders.
We propose to employ denoising diffusion models for the soft decoding of linear codes at arbitrary block lengths.
arXiv Detail & Related papers (2022-09-16T11:00:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.