SiamTrans: Zero-Shot Multi-Frame Image Restoration with Pre-Trained
Siamese Transformers
- URL: http://arxiv.org/abs/2112.09426v1
- Date: Fri, 17 Dec 2021 10:42:39 GMT
- Title: SiamTrans: Zero-Shot Multi-Frame Image Restoration with Pre-Trained
Siamese Transformers
- Authors: Lin Liu, Shanxin Yuan, Jianzhuang Liu, Xin Guo, Youliang Yan, Qi Tian
- Abstract summary: We propose a novel zero-shot multi-frame image restoration method for removing unwanted obstruction elements.
It has three stages: transformer pre-training, zero-shot restoration, and hard patch refinement.
For zero-shot image restoration, we design a novel model, termed SiamTrans, which is constructed by Siamese transformers, encoders, and decoders.
- Score: 95.57829796484472
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We propose a novel zero-shot multi-frame image restoration method for
removing unwanted obstruction elements (such as rains, snow, and moire
patterns) that vary in successive frames. It has three stages: transformer
pre-training, zero-shot restoration, and hard patch refinement. Using the
pre-trained transformers, our model is able to tell the motion difference
between the true image information and the obstructing elements. For zero-shot
image restoration, we design a novel model, termed SiamTrans, which is
constructed by Siamese transformers, encoders, and decoders. Each transformer
has a temporal attention layer and several self-attention layers, to capture
both temporal and spatial information of multiple frames. Only pre-trained
(self-supervised) on the denoising task, SiamTrans is tested on three different
low-level vision tasks (deraining, demoireing, and desnowing). Compared with
related methods, ours achieves the best performances, even outperforming those
with supervised learning.
Related papers
- Boosting vision transformers for image retrieval [11.441395750267052]
Vision transformers have achieved remarkable progress in vision tasks such as image classification and detection.
However, in instance-level image retrieval, transformers have not yet shown good performance compared to convolutional networks.
We propose a number of improvements that make transformers outperform the state of the art for the first time.
arXiv Detail & Related papers (2022-10-21T12:17:12Z) - Three things everyone should know about Vision Transformers [67.30250766591405]
transformer architectures have rapidly gained traction in computer vision.
We offer three insights based on simple and easy to implement variants of vision transformers.
We evaluate the impact of these design choices using the ImageNet-1k dataset, and confirm our findings on the ImageNet-v2 test set.
arXiv Detail & Related papers (2022-03-18T08:23:03Z) - RePre: Improving Self-Supervised Vision Transformer with Reconstructive
Pre-training [80.44284270879028]
This paper incorporates local feature learning into self-supervised vision transformers via Reconstructive Pre-training (RePre)
Our RePre extends contrastive frameworks by adding a branch for reconstructing raw image pixels in parallel with the existing contrastive objective.
arXiv Detail & Related papers (2022-01-18T10:24:58Z) - U2-Former: A Nested U-shaped Transformer for Image Restoration [30.187257111046556]
We present a deep and effective Transformer-based network for image restoration, termed as U2-Former.
It is able to employ Transformer as the core operation to perform image restoration in a deep encoding and decoding space.
arXiv Detail & Related papers (2021-12-04T08:37:04Z) - Long-Short Temporal Contrastive Learning of Video Transformers [62.71874976426988]
Self-supervised pretraining of video transformers on video-only datasets can lead to action recognition results on par or better than those obtained with supervised pretraining on large-scale image datasets.
Our approach, named Long-Short Temporal Contrastive Learning, enables video transformers to learn an effective clip-level representation by predicting temporal context captured from a longer temporal extent.
arXiv Detail & Related papers (2021-06-17T02:30:26Z) - Improve Vision Transformers Training by Suppressing Over-smoothing [28.171262066145612]
Introducing the transformer structure into computer vision tasks holds the promise of yielding a better speed-accuracy trade-off than traditional convolution networks.
However, directly training vanilla transformers on vision tasks has been shown to yield unstable and sub-optimal results.
Recent works propose to modify transformer structures by incorporating convolutional layers to improve the performance on vision tasks.
arXiv Detail & Related papers (2021-04-26T17:43:04Z) - Restoration of Video Frames from a Single Blurred Image with Motion
Understanding [69.90724075337194]
We propose a novel framework to generate clean video frames from a single motion-red image.
We formulate video restoration from a single blurred image as an inverse problem by setting clean image sequence and their respective motion as latent factors.
Our framework is based on anblur-decoder structure with spatial transformer network modules.
arXiv Detail & Related papers (2021-04-19T08:32:57Z) - Powers of layers for image-to-image translation [60.5529622990682]
We propose a simple architecture to address unpaired image-to-image translation tasks.
We start from an image autoencoder architecture with fixed weights.
For each task we learn a residual block operating in the latent space, which is iteratively called until the target domain is reached.
arXiv Detail & Related papers (2020-08-13T09:02:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.