Xformer: Hybrid X-Shaped Transformer for Image Denoising
- URL: http://arxiv.org/abs/2303.06440v2
- Date: Sun, 25 Feb 2024 03:29:01 GMT
- Title: Xformer: Hybrid X-Shaped Transformer for Image Denoising
- Authors: Jiale Zhang and Yulun Zhang and Jinjin Gu and Jiahua Dong and Linghe
Kong and Xiaokang Yang
- Abstract summary: We present a hybrid X-shaped vision Transformer, named Xformer, which performs notably on image denoising tasks.
Xformer achieves state-of-the-art performance on the synthetic and real-world image denoising tasks.
- Score: 114.37510775636811
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present a hybrid X-shaped vision Transformer, named
Xformer, which performs notably on image denoising tasks. We explore
strengthening the global representation of tokens from different scopes. In
detail, we adopt two types of Transformer blocks. The spatial-wise Transformer
block performs fine-grained local patches interactions across tokens defined by
spatial dimension. The channel-wise Transformer block performs direct global
context interactions across tokens defined by channel dimension. Based on the
concurrent network structure, we design two branches to conduct these two
interaction fashions. Within each branch, we employ an encoder-decoder
architecture to capture multi-scale features. Besides, we propose the
Bidirectional Connection Unit (BCU) to couple the learned representations from
these two branches while providing enhanced information fusion. The joint
designs make our Xformer powerful to conduct global information modeling in
both spatial and channel dimensions. Extensive experiments show that Xformer,
under the comparable model complexity, achieves state-of-the-art performance on
the synthetic and real-world image denoising tasks. We also provide code and
models at https://github.com/gladzhang/Xformer.
Related papers
- A Hybrid Transformer-Mamba Network for Single Image Deraining [70.64069487982916]
Existing deraining Transformers employ self-attention mechanisms with fixed-range windows or along channel dimensions.
We introduce a novel dual-branch hybrid Transformer-Mamba network, denoted as TransMamba, aimed at effectively capturing long-range rain-related dependencies.
arXiv Detail & Related papers (2024-08-31T10:03:19Z) - Dual Aggregation Transformer for Image Super-Resolution [92.41781921611646]
We propose a novel Transformer model, Dual Aggregation Transformer, for image SR.
Our DAT aggregates features across spatial and channel dimensions, in the inter-block and intra-block dual manner.
Our experiments show that our DAT surpasses current methods.
arXiv Detail & Related papers (2023-08-07T07:39:39Z) - AMT: All-Pairs Multi-Field Transforms for Efficient Frame Interpolation [80.33846577924363]
We present All-Pairs Multi-Field Transforms (AMT), a new network architecture for video framegithub.
It is based on two essential designs. First, we build bidirectional volumes for all pairs of pixels, and use the predicted bilateral flows to retrieve correlations.
Second, we derive multiple groups of fine-grained flow fields from one pair of updated coarse flows for performing backward warping on the input frames separately.
arXiv Detail & Related papers (2023-04-19T16:18:47Z) - Multimodal Image Fusion based on Hybrid CNN-Transformer and Non-local
Cross-modal Attention [12.167049432063132]
We present a hybrid model consisting of a convolutional encoder and a Transformer-based decoder to fuse multimodal images.
A branch fusion module is designed to adaptively fuse the features of the two branches.
arXiv Detail & Related papers (2022-10-18T13:30:52Z) - HiFormer: Hierarchical Multi-scale Representations Using Transformers
for Medical Image Segmentation [3.478921293603811]
HiFormer is a novel method that efficiently bridges a CNN and a transformer for medical image segmentation.
To secure a fine fusion of global and local features, we propose a Double-Level Fusion (DLF) module in the skip connection of the encoder-decoder structure.
arXiv Detail & Related papers (2022-07-18T11:30:06Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - XCiT: Cross-Covariance Image Transformers [73.33400159139708]
We propose a "transposed" version of self-attention that operates across feature channels rather than tokens.
The resulting cross-covariance attention (XCA) has linear complexity in the number of tokens, and allows efficient processing of high-resolution images.
arXiv Detail & Related papers (2021-06-17T17:33:35Z) - TransVOS: Video Object Segmentation with Transformers [13.311777431243296]
We propose a vision transformer to fully exploit and model both the temporal and spatial relationships.
To slim the popular two-encoder pipeline, we design a single two-path feature extractor.
Experiments demonstrate the superiority of our TransVOS over state-of-the-art methods on both DAVIS and YouTube-VOS datasets.
arXiv Detail & Related papers (2021-06-01T15:56:10Z) - CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image
Classification [17.709880544501758]
We propose a dual-branch transformer to combine image patches of different sizes to produce stronger image features.
Our approach processes small-patch and large-patch tokens with two separate branches of different computational complexity.
Our proposed cross-attention only requires linear time for both computational and memory complexity instead of quadratic time otherwise.
arXiv Detail & Related papers (2021-03-27T13:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.