RFR-WWANet: Weighted Window Attention-Based Recovery Feature Resolution
Network for Unsupervised Image Registration
- URL: http://arxiv.org/abs/2305.04236v2
- Date: Mon, 22 May 2023 02:41:32 GMT
- Title: RFR-WWANet: Weighted Window Attention-Based Recovery Feature Resolution
Network for Unsupervised Image Registration
- Authors: Mingrui Ma, Tao Wang, Lei Song, Weijie Wang, Guixia Liu
- Abstract summary: The Swin transformer has attracted attention in medical image analysis due to its computational efficiency and long-range modeling capability.
The registration models based on transformers combine multiple voxels into a single semantic token.
This merging process limits the transformers to model and generate coarse-grained spatial information.
We propose Recovery Feature Resolution Network (RFRNet), which allows the transformer to contribute fine-grained spatial information.
- Score: 7.446209993071451
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Swin transformer has recently attracted attention in medical image
analysis due to its computational efficiency and long-range modeling
capability. Owing to these properties, the Swin Transformer is suitable for
establishing more distant relationships between corresponding voxels in
different positions in complex abdominal image registration tasks. However, the
registration models based on transformers combine multiple voxels into a single
semantic token. This merging process limits the transformers to model and
generate coarse-grained spatial information. To address this issue, we propose
Recovery Feature Resolution Network (RFRNet), which allows the transformer to
contribute fine-grained spatial information and rich semantic correspondences
to higher resolution levels. Furthermore, shifted window partitioning
operations are inflexible, indicating that they cannot perceive the semantic
information over uncertain distances and automatically bridge the global
connections between windows. Therefore, we present a Weighted Window Attention
(WWA) to build global interactions between windows automatically. It is
implemented after the regular and cyclic shift window partitioning operations
within the Swin transformer block. The proposed unsupervised deformable image
registration model, named RFR-WWANet, detects the long-range correlations, and
facilitates meaningful semantic relevance of anatomical structures. Qualitative
and quantitative results show that RFR-WWANet achieves significant improvements
over the current state-of-the-art methods. Ablation experiments demonstrate the
effectiveness of the RFRNet and WWA designs. Our code is available at
\url{https://github.com/MingR-Ma/RFR-WWANet}.
Related papers
- IPT-V2: Efficient Image Processing Transformer using Hierarchical Attentions [26.09373405194564]
We present an efficient image processing transformer architecture with hierarchical attentions, called IPTV2.
We adopt a focal context self-attention (FCSA) and a global grid self-attention (GGSA) to obtain adequate token interactions in local and global receptive fields.
Our proposed IPT-V2 achieves state-of-the-art results on various image processing tasks, covering denoising, deblurring, deraining and obtains much better trade-off for performance and computational complexity than previous methods.
arXiv Detail & Related papers (2024-03-31T10:01:20Z) - CT-MVSNet: Efficient Multi-View Stereo with Cross-scale Transformer [8.962657021133925]
Cross-scale transformer (CT) processes feature representations at different stages without additional computation.
We introduce an adaptive matching-aware transformer (AMT) that employs different interactive attention combinations at multiple scales.
We also present a dual-feature guided aggregation (DFGA) that embeds the coarse global semantic information into the finer cost volume construction.
arXiv Detail & Related papers (2023-12-14T01:33:18Z) - Deformable Cross-Attention Transformer for Medical Image Registration [11.498623409184225]
We propose a novel mechanism that computes windowed attention using deformable windows.
The proposed model was extensively evaluated on multi-modal, mono-modal, and atlas-to-patient registration tasks.
arXiv Detail & Related papers (2023-03-10T19:22:01Z) - Optimizing Vision Transformers for Medical Image Segmentation and
Few-Shot Domain Adaptation [11.690799827071606]
We propose Convolutional Swin-Unet (CS-Unet) transformer blocks and optimise their settings with relation to patch embedding, projection, the feed-forward network, up sampling and skip connections.
CS-Unet can be trained from scratch and inherits the superiority of convolutions in each feature process phase.
Experiments show that CS-Unet without pre-training surpasses other state-of-the-art counterparts by large margins on two medical CT and MRI datasets with fewer parameters.
arXiv Detail & Related papers (2022-10-14T19:18:52Z) - Accurate Image Restoration with Attention Retractable Transformer [50.05204240159985]
We propose Attention Retractable Transformer (ART) for image restoration.
ART presents both dense and sparse attention modules in the network.
We conduct extensive experiments on image super-resolution, denoising, and JPEG compression artifact reduction tasks.
arXiv Detail & Related papers (2022-10-04T07:35:01Z) - Cross-receptive Focused Inference Network for Lightweight Image
Super-Resolution [64.25751738088015]
Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks.
Transformers that need to incorporate contextual information to extract features dynamically are neglected.
We propose a lightweight Cross-receptive Focused Inference Network (CFIN) that consists of a cascade of CT Blocks mixed with CNN and Transformer.
arXiv Detail & Related papers (2022-07-06T16:32:29Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result.
Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z) - XCiT: Cross-Covariance Image Transformers [73.33400159139708]
We propose a "transposed" version of self-attention that operates across feature channels rather than tokens.
The resulting cross-covariance attention (XCA) has linear complexity in the number of tokens, and allows efficient processing of high-resolution images.
arXiv Detail & Related papers (2021-06-17T17:33:35Z) - Rethinking Global Context in Crowd Counting [70.54184500538338]
A pure transformer is used to extract features with global information from overlapping image patches.
Inspired by classification, we add a context token to the input sequence, to facilitate information exchange with tokens corresponding to image patches.
arXiv Detail & Related papers (2021-05-23T12:44:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.