Cross Aggregation Transformer for Image Restoration
- URL: http://arxiv.org/abs/2211.13654v2
- Date: Thu, 23 Mar 2023 11:14:35 GMT
- Title: Cross Aggregation Transformer for Image Restoration
- Authors: Zheng Chen, Yulun Zhang, Jinjin Gu, Yongbing Zhang, Linghe Kong, Xin
Yuan
- Abstract summary: Recently, Transformer architecture has been introduced into image restoration to replace convolution neural network (CNN) with surprising results.
To address the above issue, we propose a new image restoration model, Cross Aggregation Transformer (CAT)
The core of our CAT is the Rectangle-Window Self-Attention (Rwin-SA), which utilizes horizontal and vertical rectangle window attention in different heads parallelly to expand the attention area and aggregate the features cross different windows.
Furthermore, we propose the Locality Complementary Module to complement the self-attention mechanism, which incorporates the inductive bias of CNN (e.g., translation in
- Score: 48.390140041131886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Transformer architecture has been introduced into image restoration
to replace convolution neural network (CNN) with surprising results.
Considering the high computational complexity of Transformer with global
attention, some methods use the local square window to limit the scope of
self-attention. However, these methods lack direct interaction among different
windows, which limits the establishment of long-range dependencies. To address
the above issue, we propose a new image restoration model, Cross Aggregation
Transformer (CAT). The core of our CAT is the Rectangle-Window Self-Attention
(Rwin-SA), which utilizes horizontal and vertical rectangle window attention in
different heads parallelly to expand the attention area and aggregate the
features cross different windows. We also introduce the Axial-Shift operation
for different window interactions. Furthermore, we propose the Locality
Complementary Module to complement the self-attention mechanism, which
incorporates the inductive bias of CNN (e.g., translation invariance and
locality) into Transformer, enabling global-local coupling. Extensive
experiments demonstrate that our CAT outperforms recent state-of-the-art
methods on several image restoration applications. The code and models are
available at https://github.com/zhengchen1999/CAT.
Related papers
- HAT: Hybrid Attention Transformer for Image Restoration [61.74223315807691]
Transformer-based methods have shown impressive performance in image restoration tasks, such as image super-resolution and denoising.
We propose a new Hybrid Attention Transformer (HAT) to activate more input pixels for better restoration.
Our HAT achieves state-of-the-art performance both quantitatively and qualitatively.
arXiv Detail & Related papers (2023-09-11T05:17:55Z) - T-former: An Efficient Transformer for Image Inpainting [50.43302925662507]
A class of attention-based network architectures, called transformer, has shown significant performance on natural language processing fields.
In this paper, we design a novel attention linearly related to the resolution according to Taylor expansion, and based on this attention, a network called $T$-former is designed for image inpainting.
Experiments on several benchmark datasets demonstrate that our proposed method achieves state-of-the-art accuracy while maintaining a relatively low number of parameters and computational complexity.
arXiv Detail & Related papers (2023-05-12T04:10:42Z) - Vision Transformer with Quadrangle Attention [76.35955924137986]
We propose a novel quadrangle attention (QA) method that extends the window-based attention to a general quadrangle formulation.
Our method employs an end-to-end learnable quadrangle regression module that predicts a transformation matrix to transform default windows into target quadrangles.
We integrate QA into plain and hierarchical vision transformers to create a new architecture named QFormer, which offers minor code modifications and negligible extra computational cost.
arXiv Detail & Related papers (2023-03-27T11:13:50Z) - Optimizing Vision Transformers for Medical Image Segmentation and
Few-Shot Domain Adaptation [11.690799827071606]
We propose Convolutional Swin-Unet (CS-Unet) transformer blocks and optimise their settings with relation to patch embedding, projection, the feed-forward network, up sampling and skip connections.
CS-Unet can be trained from scratch and inherits the superiority of convolutions in each feature process phase.
Experiments show that CS-Unet without pre-training surpasses other state-of-the-art counterparts by large margins on two medical CT and MRI datasets with fewer parameters.
arXiv Detail & Related papers (2022-10-14T19:18:52Z) - Accurate Image Restoration with Attention Retractable Transformer [50.05204240159985]
We propose Attention Retractable Transformer (ART) for image restoration.
ART presents both dense and sparse attention modules in the network.
We conduct extensive experiments on image super-resolution, denoising, and JPEG compression artifact reduction tasks.
arXiv Detail & Related papers (2022-10-04T07:35:01Z) - XCiT: Cross-Covariance Image Transformers [73.33400159139708]
We propose a "transposed" version of self-attention that operates across feature channels rather than tokens.
The resulting cross-covariance attention (XCA) has linear complexity in the number of tokens, and allows efficient processing of high-resolution images.
arXiv Detail & Related papers (2021-06-17T17:33:35Z) - CAT: Cross Attention in Vision Transformer [39.862909079452294]
We propose a new attention mechanism in Transformer called Cross Attention.
It alternates attention inner the image patch instead of the whole image to capture local information.
We build a hierarchical network called Cross Attention Transformer(CAT) for other vision tasks.
arXiv Detail & Related papers (2021-06-10T14:38:32Z) - CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image
Classification [17.709880544501758]
We propose a dual-branch transformer to combine image patches of different sizes to produce stronger image features.
Our approach processes small-patch and large-patch tokens with two separate branches of different computational complexity.
Our proposed cross-attention only requires linear time for both computational and memory complexity instead of quadratic time otherwise.
arXiv Detail & Related papers (2021-03-27T13:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.