RFR-WWANet: Weighted Window Attention-Based Recovery Feature Resolution
Network for Unsupervised Image Registration
- URL: http://arxiv.org/abs/2305.04236v2
- Date: Mon, 22 May 2023 02:41:32 GMT
- Title: RFR-WWANet: Weighted Window Attention-Based Recovery Feature Resolution
Network for Unsupervised Image Registration
- Authors: Mingrui Ma, Tao Wang, Lei Song, Weijie Wang, Guixia Liu
- Abstract summary: The Swin transformer has attracted attention in medical image analysis due to its computational efficiency and long-range modeling capability.
The registration models based on transformers combine multiple voxels into a single semantic token.
This merging process limits the transformers to model and generate coarse-grained spatial information.
We propose Recovery Feature Resolution Network (RFRNet), which allows the transformer to contribute fine-grained spatial information.
- Score: 7.446209993071451
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Swin transformer has recently attracted attention in medical image
analysis due to its computational efficiency and long-range modeling
capability. Owing to these properties, the Swin Transformer is suitable for
establishing more distant relationships between corresponding voxels in
different positions in complex abdominal image registration tasks. However, the
registration models based on transformers combine multiple voxels into a single
semantic token. This merging process limits the transformers to model and
generate coarse-grained spatial information. To address this issue, we propose
Recovery Feature Resolution Network (RFRNet), which allows the transformer to
contribute fine-grained spatial information and rich semantic correspondences
to higher resolution levels. Furthermore, shifted window partitioning
operations are inflexible, indicating that they cannot perceive the semantic
information over uncertain distances and automatically bridge the global
connections between windows. Therefore, we present a Weighted Window Attention
(WWA) to build global interactions between windows automatically. It is
implemented after the regular and cyclic shift window partitioning operations
within the Swin transformer block. The proposed unsupervised deformable image
registration model, named RFR-WWANet, detects the long-range correlations, and
facilitates meaningful semantic relevance of anatomical structures. Qualitative
and quantitative results show that RFR-WWANet achieves significant improvements
over the current state-of-the-art methods. Ablation experiments demonstrate the
effectiveness of the RFRNet and WWA designs. Our code is available at
\url{https://github.com/MingR-Ma/RFR-WWANet}.
Related papers
- TransAdapter: Vision Transformer for Feature-Centric Unsupervised Domain Adaptation [0.3277163122167433]
Unsupervised Domain Adaptation (UDA) aims to utilize labeled data from a source domain to solve tasks in an unlabeled target domain.
Traditional CNN-based methods struggle to fully capture complex domain relationships.
We propose a novel UDA approach leveraging the Swin Transformer with three key modules.
arXiv Detail & Related papers (2024-12-05T11:11:39Z) - Sharing Key Semantics in Transformer Makes Efficient Image Restoration [148.22790334216117]
Self-attention mechanism, a cornerstone of Vision Transformers (ViTs) tends to encompass all global cues.
Small segments of a degraded image, particularly those closely aligned semantically, provide particularly relevant information to aid in the restoration process.
We propose boosting IR's performance by sharing the key semantics via Transformer for IR (ie, SemanIR) in this paper.
arXiv Detail & Related papers (2024-05-30T12:45:34Z) - IPT-V2: Efficient Image Processing Transformer using Hierarchical Attentions [26.09373405194564]
We present an efficient image processing transformer architecture with hierarchical attentions, called IPTV2.
We adopt a focal context self-attention (FCSA) and a global grid self-attention (GGSA) to obtain adequate token interactions in local and global receptive fields.
Our proposed IPT-V2 achieves state-of-the-art results on various image processing tasks, covering denoising, deblurring, deraining and obtains much better trade-off for performance and computational complexity than previous methods.
arXiv Detail & Related papers (2024-03-31T10:01:20Z) - Deformable Cross-Attention Transformer for Medical Image Registration [11.498623409184225]
We propose a novel mechanism that computes windowed attention using deformable windows.
The proposed model was extensively evaluated on multi-modal, mono-modal, and atlas-to-patient registration tasks.
arXiv Detail & Related papers (2023-03-10T19:22:01Z) - Optimizing Vision Transformers for Medical Image Segmentation and
Few-Shot Domain Adaptation [11.690799827071606]
We propose Convolutional Swin-Unet (CS-Unet) transformer blocks and optimise their settings with relation to patch embedding, projection, the feed-forward network, up sampling and skip connections.
CS-Unet can be trained from scratch and inherits the superiority of convolutions in each feature process phase.
Experiments show that CS-Unet without pre-training surpasses other state-of-the-art counterparts by large margins on two medical CT and MRI datasets with fewer parameters.
arXiv Detail & Related papers (2022-10-14T19:18:52Z) - Cross-receptive Focused Inference Network for Lightweight Image
Super-Resolution [64.25751738088015]
Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks.
Transformers that need to incorporate contextual information to extract features dynamically are neglected.
We propose a lightweight Cross-receptive Focused Inference Network (CFIN) that consists of a cascade of CT Blocks mixed with CNN and Transformer.
arXiv Detail & Related papers (2022-07-06T16:32:29Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result.
Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z) - XCiT: Cross-Covariance Image Transformers [73.33400159139708]
We propose a "transposed" version of self-attention that operates across feature channels rather than tokens.
The resulting cross-covariance attention (XCA) has linear complexity in the number of tokens, and allows efficient processing of high-resolution images.
arXiv Detail & Related papers (2021-06-17T17:33:35Z) - Rethinking Global Context in Crowd Counting [70.54184500538338]
A pure transformer is used to extract features with global information from overlapping image patches.
Inspired by classification, we add a context token to the input sequence, to facilitate information exchange with tokens corresponding to image patches.
arXiv Detail & Related papers (2021-05-23T12:44:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.