Universal Image Restoration Pre-training via Masked Degradation Classification
- URL: http://arxiv.org/abs/2510.13282v1
- Date: Wed, 15 Oct 2025 08:30:15 GMT
- Title: Universal Image Restoration Pre-training via Masked Degradation Classification
- Authors: JiaKui Hu, Zhengjian Yao, Lujia Jin, Yinghao Chen, Yanye Lu,
- Abstract summary: Masked Degradation Classification Pre-Training method (MaskDCPT) designed to facilitate the classification of degradation types in input images.<n>MaskDCPT includes an encoder and two decoders: the encoder extracts features from the masked low-quality input image.<n>MaskDCPT significantly improves performance for both convolution neural networks (CNNs) and Transformers.
- Score: 18.68152341523977
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This study introduces a Masked Degradation Classification Pre-Training method (MaskDCPT), designed to facilitate the classification of degradation types in input images, leading to comprehensive image restoration pre-training. Unlike conventional pre-training methods, MaskDCPT uses the degradation type of the image as an extremely weak supervision, while simultaneously leveraging the image reconstruction to enhance performance and robustness. MaskDCPT includes an encoder and two decoders: the encoder extracts features from the masked low-quality input image. The classification decoder uses these features to identify the degradation type, whereas the reconstruction decoder aims to reconstruct a corresponding high-quality image. This design allows the pre-training to benefit from both masked image modeling and contrastive learning, resulting in a generalized representation suited for restoration tasks. Benefit from the straightforward yet potent MaskDCPT, the pre-trained encoder can be used to address universal image restoration and achieve outstanding performance. Implementing MaskDCPT significantly improves performance for both convolution neural networks (CNNs) and Transformers, with a minimum increase in PSNR of 3.77 dB in the 5D all-in-one restoration task and a 34.8% reduction in PIQE compared to baseline in real-world degradation scenarios. It also emergences strong generalization to previously unseen degradation types and levels. In addition, we curate and release the UIR-2.5M dataset, which includes 2.5 million paired restoration samples across 19 degradation types and over 200 degradation levels, incorporating both synthetic and real-world data. The dataset, source code, and models are available at https://github.com/MILab-PKU/MaskDCPT.
Related papers
- SSDD: Single-Step Diffusion Decoder for Efficient Image Tokenization [56.12853087022071]
We introduce a new pixel diffusion decoder architecture for improved scaling and training stability.<n>We use distillation to replicate the performance of the diffusion decoder in an efficient single-step decoder.<n>This makes SSDD the first diffusion decoder optimized for single-step reconstruction trained without adversarial losses.
arXiv Detail & Related papers (2025-10-06T15:57:31Z) - RAM++: Robust Representation Learning via Adaptive Mask for All-in-One Image Restoration [94.49712266736141]
RAM++ is a two-stage framework for all-in-one image restoration.<n>It integrates high-level semantic understanding with low-level texture generation.<n>It addresses the limitations of existing degradation-oriented methods in extreme scenarios.
arXiv Detail & Related papers (2025-09-15T15:24:15Z) - Hierarchical Mask-Enhanced Dual Reconstruction Network for Few-Shot Fine-Grained Image Classification [7.4334395431083715]
We propose the Hierarchical Mask-enhanced Dual Reconstruction Network (HMDRN) to improve fine-grained classification.<n>HMDRN incorporates a dual-layer feature reconstruction and fusion module that leverages complementary visual information from different network hierarchies.<n> experiments on three challenging fine-grained datasets demonstrate that HDRN consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-06-25T09:15:59Z) - Universal Image Restoration Pre-training via Degradation Classification [4.616424949496203]
Degradation Classification Pre-Training enables models to learn how to classify the degradation type of input images for universal image restoration pre-training.<n>Both convolutional neural networks (CNNs) and transformers demonstrate performance improvements, with gains of up to 2.55 dB in the 10D all-in-one restoration task and 6.53 dB in the mixed degradation scenarios.
arXiv Detail & Related papers (2025-01-26T13:03:37Z) - Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model [55.46927355649013]
We introduce a novel Multi-modal Guided Real-World Face Restoration technique.<n>MGFR can mitigate the generation of false facial attributes and identities.<n>We present the Reface-HQ dataset, comprising over 21,000 high-resolution facial images across 4800 identities.
arXiv Detail & Related papers (2024-10-05T13:46:56Z) - Timestep-Aware Diffusion Model for Extreme Image Rescaling [47.89362819768323]
We propose a novel framework called Timestep-Aware Diffusion Model (TADM) for extreme image rescaling.<n>TADM performs rescaling operations in the latent space of a pre-trained autoencoder.<n>It effectively leverages powerful natural image priors learned by a pre-trained text-to-image diffusion model.
arXiv Detail & Related papers (2024-08-17T09:51:42Z) - Exploring Distortion Prior with Latent Diffusion Models for Remote Sensing Image Compression [9.742764207747697]
We propose a latent diffusion model-based remote sensing image compression (LDM-RSIC) method.
In the first stage, a self-encoder learns prior from the high-quality input image.
In the second stage, the prior is generated through an LDM conditioned on the decoded image of an existing learning-based image compression algorithm.
arXiv Detail & Related papers (2024-06-06T11:13:44Z) - Neural Image Compression Using Masked Sparse Visual Representation [17.229601298529825]
We study neural image compression based on the Sparse Visual Representation (SVR), where images are embedded into a discrete latent space spanned by learned visual codebooks.
By sharing codebooks with the decoder, the encoder transfers codeword indices that are efficient and cross-platform robust.
We propose a Masked Adaptive Codebook learning (M-AdaCode) method that applies masks to the latent feature subspace to balance and reconstruction quality.
arXiv Detail & Related papers (2023-09-20T21:59:23Z) - DR2: Diffusion-based Robust Degradation Remover for Blind Face
Restoration [66.01846902242355]
Blind face restoration usually synthesizes degraded low-quality data with a pre-defined degradation model for training.
It is expensive and infeasible to include every type of degradation to cover real-world cases in the training data.
We propose Robust Degradation Remover (DR2) to first transform the degraded image to a coarse but degradation-invariant prediction, then employ an enhancement module to restore the coarse prediction to a high-quality image.
arXiv Detail & Related papers (2023-03-13T06:05:18Z) - Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature
Mimicking [35.11620617064127]
Masked Autoencoders (MAE) have been popular paradigms for large-scale vision representation pre-training.
We propose MR-MAE, which jointly learns high-level and low-level representations without interference during pre-training.
On ImageNet-1K, the MR-MAE base pre-trained for only 400 epochs achieves 85.8% top-1 accuracy after fine-tuning.
arXiv Detail & Related papers (2023-03-09T18:28:18Z) - Towards Robust Blind Face Restoration with Codebook Lookup Transformer [94.48731935629066]
Blind face restoration is a highly ill-posed problem that often requires auxiliary guidance.
We show that a learned discrete codebook prior in a small proxy space cast blind face restoration as a code prediction task.
We propose a Transformer-based prediction network, named CodeFormer, to model global composition and context of the low-quality faces.
arXiv Detail & Related papers (2022-06-22T17:58:01Z) - Reducing Redundancy in the Bottleneck Representation of the Autoencoders [98.78384185493624]
Autoencoders are a type of unsupervised neural networks, which can be used to solve various tasks.
We propose a scheme to explicitly penalize feature redundancies in the bottleneck representation.
We tested our approach across different tasks: dimensionality reduction using three different dataset, image compression using the MNIST dataset, and image denoising using fashion MNIST.
arXiv Detail & Related papers (2022-02-09T18:48:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.