Towards Vision Transformer Unrolling Fixed-Point Algorithm: a Case Study
on Image Restoration
- URL: http://arxiv.org/abs/2301.12332v1
- Date: Sun, 29 Jan 2023 02:59:14 GMT
- Title: Towards Vision Transformer Unrolling Fixed-Point Algorithm: a Case Study
on Image Restoration
- Authors: Peng Qiao, Sidun Liu, Tao Sun, Ke Yang, Yong Dou
- Abstract summary: We propose a framework to unroll the FP and approximate each unrolled process via Transformer blocks, called FPformer.
In order to fully exploit the capability of the Transformer, we apply the proposed model to image restoration, using self-supervised pre-training and supervised fine-tuning.
Using self-supervised pre-training and supervised fine-tuning, the proposed FPformer, FPRformer, and FPAformer achieve competitive performance with state-of-the-art image restoration methods and better training efficiency.
- Score: 21.79667520132755
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The great success of Deep Neural Networks (DNNs) has inspired the algorithmic
development of DNN-based Fixed-Point (DNN-FP) for computer vision tasks. DNN-FP
methods, trained by Back-Propagation Through Time or computing the inaccurate
inversion of the Jacobian, suffer from inferior representation ability.
Motivated by the representation power of the Transformer, we propose a
framework to unroll the FP and approximate each unrolled process via
Transformer blocks, called FPformer. To reduce the high consumption of memory
and computation, we come up with FPRformer by sharing parameters between the
successive blocks. We further design a module to adapt Anderson acceleration to
FPRformer to enlarge the unrolled iterations and improve the performance,
called FPAformer. In order to fully exploit the capability of the Transformer,
we apply the proposed model to image restoration, using self-supervised
pre-training and supervised fine-tuning. 161 tasks from 4 categories of image
restoration problems are used in the pre-training phase. Hereafter, the
pre-trained FPformer, FPRformer, and FPAformer are further fine-tuned for the
comparison scenarios. Using self-supervised pre-training and supervised
fine-tuning, the proposed FPformer, FPRformer, and FPAformer achieve
competitive performance with state-of-the-art image restoration methods and
better training efficiency. FPAformer employs only 29.82% parameters used in
SwinIR models, and provides superior performance after fine-tuning. To train
these comparison models, it takes only 26.9% time used for training SwinIR
models. It provides a promising way to introduce the Transformer in low-level
vision tasks.
Related papers
- Numerical Pruning for Efficient Autoregressive Models [87.56342118369123]
This paper focuses on compressing decoder-only transformer-based autoregressive models through structural weight pruning.
Specifically, we propose a training-free pruning method that calculates a numerical score with Newton's method for the Attention and modules, respectively.
To verify the effectiveness of our method, we provide both theoretical support and extensive experiments.
arXiv Detail & Related papers (2024-12-17T01:09:23Z) - PredFormer: Transformers Are Effective Spatial-Temporal Predictive Learners [65.93130697098658]
This paper proposes PredFormer, a pure transformer-based framework for predictive learning.
With its recurrent-free, transformer-based design, PredFormer is both simple and efficient.
experiments on synthetic and real-world datasets demonstrate that PredFormer achieves state-the-art performance.
arXiv Detail & Related papers (2024-10-07T03:52:06Z) - Training Transformer Models by Wavelet Losses Improves Quantitative and Visual Performance in Single Image Super-Resolution [6.367865391518726]
Transformer-based models have achieved remarkable results in low-level vision tasks including image super-resolution (SR)
To activate more input pixels globally, hybrid attention models have been proposed.
We employ wavelet losses to train Transformer models to improve quantitative and subjective performance.
arXiv Detail & Related papers (2024-04-17T11:25:19Z) - Boosting Image Restoration via Priors from Pre-trained Models [54.83907596825985]
We learn an additional lightweight module called Pre-Train-Guided Refinement Module (PTG-RM) to refine restoration results of a target restoration network with OSF.
PTG-RM effectively enhances restoration performance of various models across different tasks, including low-light enhancement, deraining, deblurring, and denoising.
arXiv Detail & Related papers (2024-03-11T15:11:57Z) - Self-Supervised Pre-Training for Table Structure Recognition Transformer [25.04573593082671]
We propose a self-supervised pre-training (SSP) method for table structure recognition transformers.
We discover that the performance gap between the linear projection transformer and the hybrid CNN-transformer can be mitigated by SSP of the visual encoder in the TSR model.
arXiv Detail & Related papers (2024-02-23T19:34:06Z) - ProFormer: Learning Data-efficient Representations of Body Movement with
Prototype-based Feature Augmentation and Visual Transformers [31.908276711898548]
Methods for data-efficient recognition from body poses increasingly leverage skeleton sequences structured as image-like arrays.
We look at this paradigm from the perspective of transformer networks, for the first time exploring visual transformers as data-efficient encoders of skeleton movement.
In our pipeline, body pose sequences cast as image-like representations are converted into patch embeddings and then passed to a visual transformer backbone optimized with deep metric learning.
arXiv Detail & Related papers (2022-02-23T11:11:54Z) - EF-Train: Enable Efficient On-device CNN Training on FPGA Through Data
Reshaping for Online Adaptation or Personalization [11.44696439060875]
EF-Train is an efficient DNN training accelerator with a unified channel-level parallelism-based convolution kernel.
It can achieve end-to-end training on resource-limited low-power edge-level FPGAs.
Our design achieves 46.99 GFLOPS and 6.09GFLOPS/W in terms of throughput and energy efficiency.
arXiv Detail & Related papers (2022-02-18T18:27:42Z) - AdaViT: Adaptive Vision Transformers for Efficient Image Recognition [78.07924262215181]
We introduce AdaViT, an adaptive framework that learns to derive usage policies on which patches, self-attention heads and transformer blocks to use.
Our method obtains more than 2x improvement on efficiency compared to state-of-the-art vision transformers with only 0.8% drop of accuracy.
arXiv Detail & Related papers (2021-11-30T18:57:02Z) - HRFormer: High-Resolution Transformer for Dense Prediction [99.6060997466614]
We present a High-Resolution Transformer (HRFormer) that learns high-resolution representations for dense prediction tasks.
We take advantage of the multi-resolution parallel design introduced in high-resolution convolutional networks (HRNet)
We demonstrate the effectiveness of the High-Resolution Transformer on both human pose estimation and semantic segmentation tasks.
arXiv Detail & Related papers (2021-10-18T15:37:58Z) - Pre-Trained Image Processing Transformer [95.93031793337613]
We develop a new pre-trained model, namely, image processing transformer (IPT)
We present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs.
IPT model is trained on these images with multi-heads and multi-tails.
arXiv Detail & Related papers (2020-12-01T09:42:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.