Image-to-Image MLP-mixer for Image Reconstruction
- URL: http://arxiv.org/abs/2202.02018v1
- Date: Fri, 4 Feb 2022 08:36:34 GMT
- Title: Image-to-Image MLP-mixer for Image Reconstruction
- Authors: Youssef Mansour, Kang Lin, Reinhard Heckel
- Abstract summary: We show that a simple network based on the multi-layer perceptron (MLP)-mixer enables state-of-the art image reconstruction performance without convolutions.
The image-to-image-mixer is based exclusively ons operating on linearly-transformed image patches.
It also outperforms the vision transformer for image reconstruction and classical un-trained methods such as BM3D.
- Score: 23.036592718421105
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural networks are highly effective tools for image reconstruction problems
such as denoising and compressive sensing. To date, neural networks for image
reconstruction are almost exclusively convolutional. The most popular
architecture is the U-Net, a convolutional network with a multi-resolution
architecture. In this work, we show that a simple network based on the
multi-layer perceptron (MLP)-mixer enables state-of-the art image
reconstruction performance without convolutions and without a multi-resolution
architecture, provided that the training set and the size of the network are
moderately large. Similar to the original MLP-mixer, the image-to-image
MLP-mixer is based exclusively on MLPs operating on linearly-transformed image
patches. Contrary to the original MLP-mixer, we incorporate structure by
retaining the relative positions of the image patches. This imposes an
inductive bias towards natural images which enables the image-to-image
MLP-mixer to learn to denoise images based on fewer examples than the original
MLP-mixer. Moreover, the image-to-image MLP-mixer requires fewer parameters to
achieve the same denoising performance than the U-Net and its parameters scale
linearly in the image resolution instead of quadratically as for the original
MLP-mixer. If trained on a moderate amount of examples for denoising, the
image-to-image MLP-mixer outperforms the U-Net by a slight margin. It also
outperforms the vision transformer tailored for image reconstruction and
classical un-trained methods such as BM3D, making it a very effective tool for
image reconstruction problems.
Related papers
- A cross Transformer for image denoising [83.68175077524111]
We propose a cross Transformer denoising CNN (CTNet) with a serial block (SB), a parallel block (PB), and a residual block (RB)
CTNet is superior to some popular denoising methods in terms of real and synthetic image denoising.
arXiv Detail & Related papers (2023-10-16T13:53:19Z) - Increasing diversity of omni-directional images generated from single
image using cGAN based on MLPMixer [0.0]
The previous method has relied on the generative adversarial networks based on convolutional neural networks (CNN)
TheMixer has been proposed as an alternative to the self-attention in the transformer, which captures long-range dependencies and contextual information.
As a result, competitive performance has been achieved with reduced memory consumption and computational cost.
arXiv Detail & Related papers (2023-09-15T03:43:29Z) - Restormer: Efficient Transformer for High-Resolution Image Restoration [118.9617735769827]
convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data.
Transformers have shown significant performance gains on natural language and high-level vision tasks.
Our model, named Restoration Transformer (Restormer), achieves state-of-the-art results on several image restoration tasks.
arXiv Detail & Related papers (2021-11-18T18:59:10Z) - Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation [70.00392682183515]
It is hard to generate an image at target view well for previous cross-view image translation methods.
We propose a novel two-stage framework with a new Cascaded Cross-Mixer (CrossMLP) sub-network.
In the first stage, the CrossMLP sub-network learns the latent transformation cues between image code and semantic map code.
In the second stage, we design a refined pixel-level loss that eases the noisy semantic label problem.
arXiv Detail & Related papers (2021-10-19T18:03:30Z) - Sparse MLP for Image Recognition: Is Self-Attention Really Necessary? [65.37917850059017]
We build an attention-free network called sMLPNet.
For 2D image tokens, sMLP applies 1D along the axial directions and the parameters are shared among rows or columns.
When scaling up to 66M parameters, sMLPNet achieves 83.4% top-1 accuracy, which is on par with the state-of-the-art Swin Transformer.
arXiv Detail & Related papers (2021-09-12T04:05:15Z) - Rethinking Token-Mixing MLP for MLP-based Vision Backbone [34.47616917228978]
We propose an improved structure as termed Circulant Channel-Specific (CCS) token-mixing benchmark, which is spatial-invariant and channel-specific.
It takes fewer parameters but achieves higher classification accuracy on ImageNet1K.
arXiv Detail & Related papers (2021-06-28T17:59:57Z) - MixerGAN: An MLP-Based Architecture for Unpaired Image-to-Image
Translation [0.0]
We propose a new unpaired image-to-image translation model called MixerGAN.
We show that MixerGAN achieves competitive results when compared to prior convolutional-based methods.
arXiv Detail & Related papers (2021-05-28T21:12:52Z) - RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for
Image Recognition [123.59890802196797]
We propose RepMLP, a multi-layer-perceptron-style neural network building block for image recognition.
We construct convolutional layers inside a RepMLP during training and merge them into the FC for inference.
By inserting RepMLP in traditional CNN, we improve ResNets by 1.8% accuracy on ImageNet, 2.9% for face recognition, and 2.3% mIoU on Cityscapes with lower FLOPs.
arXiv Detail & Related papers (2021-05-05T06:17:40Z) - MLP-Mixer: An all-MLP Architecture for Vision [93.16118698071993]
We present-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs).
Mixer attains competitive scores on image classification benchmarks, with pre-training and inference comparable to state-of-the-art models.
arXiv Detail & Related papers (2021-05-04T16:17:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.