ModeT: Learning Deformable Image Registration via Motion Decomposition
Transformer
- URL: http://arxiv.org/abs/2306.05688v1
- Date: Fri, 9 Jun 2023 06:00:05 GMT
- Title: ModeT: Learning Deformable Image Registration via Motion Decomposition
Transformer
- Authors: Haiqiao Wang, Dong Ni, and Yi Wang
- Abstract summary: We propose a novel motion decomposition Transformer (ModeT) to explicitly model multiple motion modalities.
Our method outperforms current state-of-the-art registration networks and Transformers.
- Score: 7.629385629884155
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Transformer structures have been widely used in computer vision and have
recently made an impact in the area of medical image registration. However, the
use of Transformer in most registration networks is straightforward. These
networks often merely use the attention mechanism to boost the feature learning
as the segmentation networks do, but do not sufficiently design to be adapted
for the registration task. In this paper, we propose a novel motion
decomposition Transformer (ModeT) to explicitly model multiple motion
modalities by fully exploiting the intrinsic capability of the Transformer
structure for deformation estimation. The proposed ModeT naturally transforms
the multi-head neighborhood attention relationship into the multi-coordinate
relationship to model multiple motion modes. Then the competitive weighting
module (CWM) fuses multiple deformation sub-fields to generate the resulting
deformation field. Extensive experiments on two public brain magnetic resonance
imaging (MRI) datasets show that our method outperforms current
state-of-the-art registration networks and Transformers, demonstrating the
potential of our ModeT for the challenging non-rigid deformation estimation
problem. The benchmarks and our code are publicly available at
https://github.com/ZAX130/SmileCode.
Related papers
- ModeTv2: GPU-accelerated Motion Decomposition Transformer for Pairwise Optimization in Medical Image Registration [6.217733993535475]
Deformable image registration plays a crucial role in medical imaging, aiding in disease diagnosis and image-guided interventions.
Traditional iterative methods are slow, while deep learning (DL) accelerates solutions but faces usability and precision challenges.
This study introduces a pyramid network with the enhanced motion Transformer (ModeTv2) operator, showcasing superior pairwise optimization (PO) akin to traditional methods.
arXiv Detail & Related papers (2024-03-25T08:09:22Z) - Self-Supervised Pre-Training for Table Structure Recognition Transformer [25.04573593082671]
We propose a self-supervised pre-training (SSP) method for table structure recognition transformers.
We discover that the performance gap between the linear projection transformer and the hybrid CNN-transformer can be mitigated by SSP of the visual encoder in the TSR model.
arXiv Detail & Related papers (2024-02-23T19:34:06Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - SeUNet-Trans: A Simple yet Effective UNet-Transformer Model for Medical
Image Segmentation [0.0]
We propose a simple yet effective UNet-Transformer (seUNet-Trans) model for medical image segmentation.
In our approach, the UNet model is designed as a feature extractor to generate multiple feature maps from the input images.
By leveraging the UNet architecture and the self-attention mechanism, our model not only retains the preservation of both local and global context information but also is capable of capturing long-range dependencies between input elements.
arXiv Detail & Related papers (2023-10-16T01:13:38Z) - TransNorm: Transformer Provides a Strong Spatial Normalization Mechanism
for a Deep Segmentation Model [4.320393382724066]
convolutional neural networks (CNNs) have been the prevailing technique in the medical image processing era.
We propose Trans-Norm, a novel deep segmentation framework which consolidates a Transformer module into both encoder and skip-connections of the standard U-Net.
arXiv Detail & Related papers (2022-07-27T09:54:10Z) - Cross-receptive Focused Inference Network for Lightweight Image
Super-Resolution [64.25751738088015]
Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks.
Transformers that need to incorporate contextual information to extract features dynamically are neglected.
We propose a lightweight Cross-receptive Focused Inference Network (CFIN) that consists of a cascade of CT Blocks mixed with CNN and Transformer.
arXiv Detail & Related papers (2022-07-06T16:32:29Z) - Symmetric Transformer-based Network for Unsupervised Image Registration [4.258536928793156]
We propose a convolution-based efficient multi-head self-attention (CEMSA) block, which reduces the parameters of the traditional Transformer.
Based on the proposed CEMSA, we present a novel Symmetric Transformer-based model (SymTrans)
Experimental results show that our proposed method achieves state-of-the-art performance in image registration.
arXiv Detail & Related papers (2022-04-28T15:45:09Z) - ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias [76.16156833138038]
We propose a novel Vision Transformer Advanced by Exploring intrinsic IB from convolutions, ie, ViTAE.
ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context.
In each transformer layer, ViTAE has a convolution block in parallel to the multi-head self-attention module, whose features are fused and fed into the feed-forward network.
arXiv Detail & Related papers (2021-06-07T05:31:06Z) - Less is More: Pay Less Attention in Vision Transformers [61.05787583247392]
Less attention vIsion Transformer builds upon the fact that convolutions, fully-connected layers, and self-attentions have almost equivalent mathematical expressions for processing image patch sequences.
The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation.
arXiv Detail & Related papers (2021-05-29T05:26:07Z) - Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer'
With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z) - TransVG: End-to-End Visual Grounding with Transformers [102.11922622103613]
We present a transformer-based framework for visual grounding, namely TransVG, to address the task of grounding a language query to an image.
We show that the complex fusion modules can be replaced by a simple stack of transformer encoder layers with higher performance.
arXiv Detail & Related papers (2021-04-17T13:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.