Universal Image Restoration Pre-training via Degradation Classification
- URL: http://arxiv.org/abs/2501.15510v1
- Date: Sun, 26 Jan 2025 13:03:37 GMT
- Title: Universal Image Restoration Pre-training via Degradation Classification
- Authors: JiaKui Hu, Lujia Jin, Zhengjian Yao, Yanye Lu,
- Abstract summary: Degradation Classification Pre-Training enables models to learn how to classify the degradation type of input images for universal image restoration pre-training.<n>Both convolutional neural networks (CNNs) and transformers demonstrate performance improvements, with gains of up to 2.55 dB in the 10D all-in-one restoration task and 6.53 dB in the mixed degradation scenarios.
- Score: 4.616424949496203
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes the Degradation Classification Pre-Training (DCPT), which enables models to learn how to classify the degradation type of input images for universal image restoration pre-training. Unlike the existing self-supervised pre-training methods, DCPT utilizes the degradation type of the input image as an extremely weak supervision, which can be effortlessly obtained, even intrinsic in all image restoration datasets. DCPT comprises two primary stages. Initially, image features are extracted from the encoder. Subsequently, a lightweight decoder, such as ResNet18, is leveraged to classify the degradation type of the input image solely based on the features extracted in the first stage, without utilizing the input image. The encoder is pre-trained with a straightforward yet potent DCPT, which is used to address universal image restoration and achieve outstanding performance. Following DCPT, both convolutional neural networks (CNNs) and transformers demonstrate performance improvements, with gains of up to 2.55 dB in the 10D all-in-one restoration task and 6.53 dB in the mixed degradation scenarios. Moreover, previous self-supervised pretraining methods, such as masked image modeling, discard the decoder after pre-training, while our DCPT utilizes the pre-trained parameters more effectively. This superiority arises from the degradation classifier acquired during DCPT, which facilitates transfer learning between models of identical architecture trained on diverse degradation types. Source code and models are available at https://github.com/MILab-PKU/dcpt.
Related papers
- Universal Image Restoration Pre-training via Masked Degradation Classification [18.68152341523977]
Masked Degradation Classification Pre-Training method (MaskDCPT) designed to facilitate the classification of degradation types in input images.<n>MaskDCPT includes an encoder and two decoders: the encoder extracts features from the masked low-quality input image.<n>MaskDCPT significantly improves performance for both convolution neural networks (CNNs) and Transformers.
arXiv Detail & Related papers (2025-10-15T08:30:15Z) - SSDD: Single-Step Diffusion Decoder for Efficient Image Tokenization [56.12853087022071]
We introduce a new pixel diffusion decoder architecture for improved scaling and training stability.<n>We use distillation to replicate the performance of the diffusion decoder in an efficient single-step decoder.<n>This makes SSDD the first diffusion decoder optimized for single-step reconstruction trained without adversarial losses.
arXiv Detail & Related papers (2025-10-06T15:57:31Z) - One-step Generative Diffusion for Realistic Extreme Image Rescaling [47.89362819768323]
We propose a novel framework called One-Step Image Rescaling Diffusion (OSIRDiff) for extreme image rescaling.
OSIRDiff performs rescaling operations in the latent space of a pre-trained autoencoder.
It effectively leverages powerful natural image priors learned by a pre-trained text-to-image diffusion model.
arXiv Detail & Related papers (2024-08-17T09:51:42Z) - HAIR: Hypernetworks-based All-in-One Image Restoration [46.681872835394095]
Hair is a Hypernetworks-based All-in-One Image Restoration plug-and-play method.
It generates parameters based on the input image and thus makes the model to adapt to specific degradation dynamically.
It can significantly improve the performance of existing image restoration models in a plug-and-play manner, both in single-task and All-in-One settings.
arXiv Detail & Related papers (2024-08-15T11:34:33Z) - Exploring Distortion Prior with Latent Diffusion Models for Remote Sensing Image Compression [9.742764207747697]
We propose a latent diffusion model-based remote sensing image compression (LDM-RSIC) method.
In the first stage, a self-encoder learns prior from the high-quality input image.
In the second stage, the prior is generated through an LDM conditioned on the decoded image of an existing learning-based image compression algorithm.
arXiv Detail & Related papers (2024-06-06T11:13:44Z) - Efficient Test-Time Adaptation for Super-Resolution with Second-Order
Degradation and Reconstruction [62.955327005837475]
Image super-resolution (SR) aims to learn a mapping from low-resolution (LR) to high-resolution (HR) using paired HR-LR training images.
We present an efficient test-time adaptation framework for SR, named SRTTA, which is able to quickly adapt SR models to test domains with different/unknown degradation types.
arXiv Detail & Related papers (2023-10-29T13:58:57Z) - Controlling Vision-Language Models for Multi-Task Image Restoration [6.239038964461397]
We present a degradation-aware vision-language model (DA-CLIP) to better transfer pretrained vision-language models to low-level vision tasks.
Our approach advances state-of-the-art performance on both emphdegradation-specific and emphunified image restoration tasks.
arXiv Detail & Related papers (2023-10-02T09:10:16Z) - DR2: Diffusion-based Robust Degradation Remover for Blind Face
Restoration [66.01846902242355]
Blind face restoration usually synthesizes degraded low-quality data with a pre-defined degradation model for training.
It is expensive and infeasible to include every type of degradation to cover real-world cases in the training data.
We propose Robust Degradation Remover (DR2) to first transform the degraded image to a coarse but degradation-invariant prediction, then employ an enhancement module to restore the coarse prediction to a high-quality image.
arXiv Detail & Related papers (2023-03-13T06:05:18Z) - Unsupervised Representation Learning from Pre-trained Diffusion
Probabilistic Models [83.75414370493289]
Diffusion Probabilistic Models (DPMs) have shown a powerful capacity of generating high-quality image samples.
Diff-AE have been proposed to explore DPMs for representation learning via autoencoding.
We propose textbfPre-trained textbfAutotextbfEncoding (textbfPDAE) to adapt existing pre-trained DPMs to the decoders for image reconstruction.
arXiv Detail & Related papers (2022-12-26T02:37:38Z) - FastMIM: Expediting Masked Image Modeling Pre-training for Vision [65.47756720190155]
FastMIM is a framework for pre-training vision backbones with low-resolution input images.
It reconstructs Histograms of Oriented Gradients (HOG) feature instead of original RGB values of the input images.
It can achieve 83.8%/84.1% top-1 accuracy on ImageNet-1K with ViT-B/Swin-B as backbones.
arXiv Detail & Related papers (2022-12-13T14:09:32Z) - Modular Degradation Simulation and Restoration for Under-Display Camera [21.048590332029995]
Under-display camera (UDC) provides an elegant solution for full-screen smartphones.
UDC captured images suffer from severe degradation since sensors lie under the display.
We propose a modular network dubbed MPGNet trained using the generative adversarial network (GAN) framework for simulating UDC imaging.
arXiv Detail & Related papers (2022-09-23T07:36:07Z) - Corrupted Image Modeling for Self-Supervised Visual Pre-Training [103.99311611776697]
We introduce Corrupted Image Modeling (CIM) for self-supervised visual pre-training.
CIM uses an auxiliary generator with a small trainable BEiT to corrupt the input image instead of using artificial mask tokens.
After pre-training, the enhancer can be used as a high-capacity visual encoder for downstream tasks.
arXiv Detail & Related papers (2022-02-07T17:59:04Z) - Towards End-to-End Image Compression and Analysis with Transformers [99.50111380056043]
We propose an end-to-end image compression and analysis model with Transformers, targeting to the cloud-based image classification application.
We aim to redesign the Vision Transformer (ViT) model to perform image classification from the compressed features and facilitate image compression with the long-term information from the Transformer.
Experimental results demonstrate the effectiveness of the proposed model in both the image compression and the classification tasks.
arXiv Detail & Related papers (2021-12-17T03:28:14Z) - Pre-Trained Image Processing Transformer [95.93031793337613]
We develop a new pre-trained model, namely, image processing transformer (IPT)
We present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs.
IPT model is trained on these images with multi-heads and multi-tails.
arXiv Detail & Related papers (2020-12-01T09:42:46Z) - Blind Image Restoration without Prior Knowledge [0.22940141855172028]
We present the Self-Normalization Side-Chain (SCNC), a novel approach to blind universal restoration in which no prior knowledge of the degradation is needed.
The SCNC can be added to any existing CNN topology, and is trained along with the rest of the network in an end-to-end manner.
arXiv Detail & Related papers (2020-03-03T19:57:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.