Unlocking Masked Autoencoders as Loss Function for Image and Video
Restoration
- URL: http://arxiv.org/abs/2303.16411v1
- Date: Wed, 29 Mar 2023 02:41:08 GMT
- Title: Unlocking Masked Autoencoders as Loss Function for Image and Video
Restoration
- Authors: Man Zhou, Naishan Zheng, Jie Huang, Chunle Guo, Chongyi Li
- Abstract summary: We study the potential of loss and raise our belief learned loss function empowers the learning capability of neural networks for image and video restoration''
We investigate the efficacy of our belief from three perspectives: 1) from task-customized MAE to native MAE, 2) from image task to video task, and 3) from transformer structure to convolution neural network structure.
- Score: 19.561055022474786
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image and video restoration has achieved a remarkable leap with the advent of
deep learning. The success of deep learning paradigm lies in three key
components: data, model, and loss. Currently, many efforts have been devoted to
the first two while seldom study focuses on loss function. With the question
``are the de facto optimization functions e.g., $L_1$, $L_2$, and perceptual
losses optimal?'', we explore the potential of loss and raise our belief
``learned loss function empowers the learning capability of neural networks for
image and video restoration''.
Concretely, we stand on the shoulders of the masked Autoencoders (MAE) and
formulate it as a `learned loss function', owing to the fact the pre-trained
MAE innately inherits the prior of image reasoning. We investigate the efficacy
of our belief from three perspectives: 1) from task-customized MAE to native
MAE, 2) from image task to video task, and 3) from transformer structure to
convolution neural network structure. Extensive experiments across multiple
image and video tasks, including image denoising, image super-resolution, image
enhancement, guided image super-resolution, video denoising, and video
enhancement, demonstrate the consistent performance improvements introduced by
the learned loss function. Besides, the learned loss function is preferable as
it can be directly plugged into existing networks during training without
involving computations in the inference stage. Code will be publicly available.
Related papers
- Class Anchor Margin Loss for Content-Based Image Retrieval [97.81742911657497]
We propose a novel repeller-attractor loss that falls in the metric learning paradigm, yet directly optimize for the L2 metric without the need of generating pairs.
We evaluate the proposed objective in the context of few-shot and full-set training on the CBIR task, by using both convolutional and transformer architectures.
arXiv Detail & Related papers (2023-06-01T12:53:10Z) - TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at
Scale [59.01246141215051]
We analyze the factor that leads to degradation from the perspective of language supervision.
We propose a tunable-free pre-training strategy to retain the generalization ability of the text encoder.
We produce a series of models, dubbed TVTSv2, with up to one billion parameters.
arXiv Detail & Related papers (2023-05-23T15:44:56Z) - WSSL: Weighted Self-supervised Learning Framework For Image-inpainting [18.297463645457693]
Image inpainting is a process of regenerating lost parts of the image.
Supervised algorithm-based methods have shown excellent results but have two significant drawbacks.
We propose a novel self-supervised learning framework for image-inpainting: Weighted Self-Supervised Learning.
arXiv Detail & Related papers (2022-11-25T01:50:33Z) - Is Deep Image Prior in Need of a Good Education? [57.3399060347311]
Deep image prior was introduced as an effective prior for image reconstruction.
Despite its impressive reconstructive properties, the approach is slow when compared to learned or traditional reconstruction techniques.
We develop a two-stage learning paradigm to address the computational challenge.
arXiv Detail & Related papers (2021-11-23T15:08:26Z) - Training a Better Loss Function for Image Restoration [17.20936270604533]
We show that a single natural image is sufficient to train a lightweight feature extractor that outperforms state-of-the-art loss functions in single image super resolution.
We propose a novel Multi-Scale Discriminative Feature (MDF) loss comprising a series of discriminators, trained to penalize errors introduced by a generator.
arXiv Detail & Related papers (2021-03-26T17:29:57Z) - Image Restoration by Deep Projected GSURE [115.57142046076164]
Ill-posed inverse problems appear in many image processing applications, such as deblurring and super-resolution.
We propose a new image restoration framework that is based on minimizing a loss function that includes a "projected-version" of the Generalized SteinUnbiased Risk Estimator (GSURE) and parameterization of the latent image by a CNN.
arXiv Detail & Related papers (2021-02-04T08:52:46Z) - Neural Sparse Representation for Image Restoration [116.72107034624344]
Inspired by the robustness and efficiency of sparse coding based image restoration models, we investigate the sparsity of neurons in deep networks.
Our method structurally enforces sparsity constraints upon hidden neurons.
Experiments show that sparse representation is crucial in deep neural networks for multiple image restoration tasks.
arXiv Detail & Related papers (2020-06-08T05:15:17Z) - Learning the Loss Functions in a Discriminative Space for Video
Restoration [48.104095018697556]
We propose a new framework for building effective loss functions by learning a discriminative space specific to a video restoration task.
Our framework is similar to GANs in that we iteratively train two networks - a generator and a loss network.
Experiments on video superresolution and deblurring show that our method generates visually more pleasing videos.
arXiv Detail & Related papers (2020-03-20T06:58:27Z) - Pretraining Image Encoders without Reconstruction via Feature Prediction
Loss [0.1529342790344802]
This work investigates three methods for calculating loss for autoencoder-based pretraining of image encoders.
We propose to decode the features of the loss network, hence the name "feature prediction loss"
arXiv Detail & Related papers (2020-03-16T21:08:43Z) - Improving Image Autoencoder Embeddings with Perceptual Loss [0.1529342790344802]
This work investigates perceptual loss from the perspective of encoder embeddings themselves.
Autoencoders are trained to embed images from three different computer vision datasets using perceptual loss.
Results show that, on the task of object positioning of a small-scale feature, perceptual loss can improve the results by a factor 10.
arXiv Detail & Related papers (2020-01-10T13:48:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.