Related papers: EfficientMorph: Parameter-Efficient Transformer-Based Architecture for 3D Image Registration

EfficientMorph: Parameter-Efficient Transformer-Based Architecture for 3D Image Registration

URL: http://arxiv.org/abs/2403.11026v1
Date: Sat, 16 Mar 2024 22:01:55 GMT
Title: EfficientMorph: Parameter-Efficient Transformer-Based Architecture for 3D Image Registration
Authors: Abu Zahid Bin Aziz, Mokshagna Sai Teja Karanam, Tushar Kataria, Shireen Y. Elhabian,
Abstract summary: We propose EfficientMorph, a transformer-based architecture for unsupervised 3D image registration. It optimize the balance between local and global attention through a plane-based attention mechanism. It reduces computational redundancy via cascaded group attention, and captures fine details without compromising computational efficiency.
Score: 1.741980945827445
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformers have emerged as the state-of-the-art architecture in medical image registration, outperforming convolutional neural networks (CNNs) by addressing their limited receptive fields and overcoming gradient instability in deeper models. Despite their success, transformer-based models require substantial resources for training, including data, memory, and computational power, which may restrict their applicability for end users with limited resources. In particular, existing transformer-based 3D image registration architectures face three critical gaps that challenge their efficiency and effectiveness. Firstly, while mitigating the quadratic complexity of full attention by focusing on local regions, window-based attention mechanisms often fail to adequately integrate local and global information. Secondly, feature similarities across attention heads that were recently found in multi-head attention architectures indicate a significant computational redundancy, suggesting that the capacity of the network could be better utilized to enhance performance. Lastly, the granularity of tokenization, a key factor in registration accuracy, presents a trade-off; smaller tokens improve detail capture at the cost of higher computational complexity, increased memory demands, and a risk of overfitting. Here, we propose EfficientMorph, a transformer-based architecture for unsupervised 3D image registration. It optimizes the balance between local and global attention through a plane-based attention mechanism, reduces computational redundancy via cascaded group attention, and captures fine details without compromising computational efficiency, thanks to a Hi-Res tokenization strategy complemented by merging operations. Notably, EfficientMorph sets a new benchmark for performance on the OASIS dataset with 16-27x fewer parameters.

Related papers

Any Image Restoration via Efficient Spatial-Frequency Degradation Adaptation [158.37640586809187]
Restoring any degraded image efficiently via just one model has become increasingly significant. Our approach, termed AnyIR, takes a unified path that leverages inherent similarity across various degradations. To fuse the degradation awareness and the contextualized attention, a spatial-frequency parallel fusion strategy is proposed.
arXiv Detail & Related papers (2025-04-19T09:54:46Z)
LandMarkSystem Technical Report [4.885906902650898]
3D reconstruction is vital for applications in autonomous driving, virtual reality, augmented reality, and the metaverse. Recent advancements such as Neural Radiance Fields(NeRF) and 3D Gaussian Splatting (3DGS) have transformed the field, yet traditional deep learning frameworks struggle to meet the increasing demands for scene quality and scale. This paper introduces LandMarkSystem, a novel computing framework designed to enhance multi-scale scene reconstruction and rendering.
arXiv Detail & Related papers (2025-03-27T10:55:36Z)
MaskAttn-UNet: A Mask Attention-Driven Framework for Universal Low-Resolution Image Segmentation [5.130440339897479]
MaskAttn-UNet is a novel segmentation framework that enhances the traditional U-Net architecture via a mask attention mechanism. Our model selectively emphasizes important regions while suppressing irrelevant backgrounds, thereby improving segmentation accuracy in cluttered and complex scenes. Our results show that MaskAttn-UNet achieves accuracy comparable to state-of-the-art methods at significantly lower computational cost than transformer-based models.
arXiv Detail & Related papers (2025-03-11T22:43:26Z)
SegResMamba: An Efficient Architecture for 3D Medical Image Segmentation [2.979183050755201]
We propose an efficient 3D segmentation model for medical imaging called SegResMamba. Our model uses less than half the memory during training compared to other state-of-the-art (SOTA) architectures.
arXiv Detail & Related papers (2025-03-10T18:40:28Z)
Any Image Restoration with Efficient Automatic Degradation Adaptation [132.81912195537433]
We propose a unified manner to achieve joint embedding by leveraging the inherent similarities across various degradations for efficient and comprehensive restoration. Our network sets new SOTA records while reducing model complexity by approximately -82% in trainable parameters and -85% in FLOPs.
arXiv Detail & Related papers (2024-07-18T10:26:53Z)
Hybrid Dynamic Pruning: A Pathway to Efficient Transformer Inference [1.0919012968294923]
We introduce a novel algorithm-architecture co-design approach that accelerates transformers using head sparsity, block sparsity and approximation opportunities to reduce computations in attention and reduce memory access. With the observation of the huge redundancy in attention scores and attention heads, we propose a novel integer-based row-balanced block pruning to prune unimportant blocks in the attention matrix at run time. Also propose integer-based head pruning to detect and prune unimportant heads at an early stage at run time.
arXiv Detail & Related papers (2024-07-17T11:15:16Z)
TRIP: Trainable Region-of-Interest Prediction for Hardware-Efficient Neuromorphic Processing on Event-based Vision [33.803108353747305]
Trainable Region-of-Interest Prediction (TRIP) is a framework for event-based vision processing on a neuromorphic processor. TRIP exploits sparse events' inherent low information density to reduce the overhead of ROI prediction. Our solution requires 46x less computation than the state-of-the-art while achieving higher accuracy.
arXiv Detail & Related papers (2024-06-25T12:04:51Z)
ReduceFormer: Attention with Tensor Reduction by Summation [4.985969607297595]
We introduce ReduceFormer, a family of models optimized for efficiency with the spirit of attention. ReduceFormer leverages only simple operations such as reduction and element-wise multiplication, leading to greatly simplified architecture and improved inference performance. The proposed model family is suitable for edge devices where compute resource and memory bandwidth are limited, as well as for cloud computing where high throughput is sought after.
arXiv Detail & Related papers (2024-06-11T17:28:09Z)
SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation [0.13654846342364302]
We present SegFormer3D, a hierarchical Transformer that calculates attention across multiscale volumetric features. SegFormer3D avoids complex decoders and uses an all-MLP decoder to aggregate local and global attention features. We benchmark SegFormer3D against the current SOTA models on three widely used datasets.
arXiv Detail & Related papers (2024-04-15T22:12:05Z)
ELA: Efficient Local Attention for Deep Convolutional Neural Networks [15.976475674061287]
This paper introduces an Efficient Local Attention (ELA) method that achieves substantial performance improvements with a simple structure. To overcome these challenges, we propose the incorporation of 1D convolution and Group Normalization feature enhancement techniques. ELA can be seamlessly integrated into deep CNN networks such as ResNet, MobileNet, and DeepLab.
arXiv Detail & Related papers (2024-03-02T08:06:18Z)
Leveraging Frequency Domain Learning in 3D Vessel Segmentation [50.54833091336862]
In this study, we leverage Fourier domain learning as a substitute for multi-scale convolutional kernels in 3D hierarchical segmentation models. We show that our novel network achieves remarkable dice performance (84.37% on ASACA500 and 80.32% on ImageCAS) in tubular vessel segmentation tasks.
arXiv Detail & Related papers (2024-01-11T19:07:58Z)
Towards Compact 3D Representations via Point Feature Enhancement Masked Autoencoders [52.66195794216989]
We propose Point Feature Enhancement Masked Autoencoders (Point-FEMAE) to learn compact 3D representations. Point-FEMAE consists of a global branch and a local branch to capture latent semantic features. Our method significantly improves the pre-training efficiency compared to cross-modal alternatives.
arXiv Detail & Related papers (2023-12-17T14:17:05Z)
Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components. CNNs are used to augment the local texture information of coarse priors. DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z)
Low-Resolution Self-Attention for Semantic Segmentation [96.81482872022237]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost. Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution. We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z)
UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation [93.88170217725805]
We propose a 3D medical image segmentation approach, named UNETR++, that offers both high-quality segmentation masks as well as efficiency in terms of parameters, compute cost, and inference speed. The core of our design is the introduction of a novel efficient paired attention (EPA) block that efficiently learns spatial and channel-wise discriminative features. Our evaluations on five benchmarks, Synapse, BTCV, ACDC, BRaTs, and Decathlon-Lung, reveal the effectiveness of our contributions in terms of both efficiency and accuracy.
arXiv Detail & Related papers (2022-12-08T18:59:57Z)
Magic ELF: Image Deraining Meets Association Learning and Transformer [63.761812092934576]
This paper aims to unify CNN and Transformer to take advantage of their learning merits for image deraining. A novel multi-input attention module (MAM) is proposed to associate rain removal and background recovery. Our proposed method (dubbed as ELF) outperforms the state-of-the-art approach (MPRNet) by 0.25 dB on average.
arXiv Detail & Related papers (2022-07-21T12:50:54Z)
Hierarchical Residual Attention Network for Single Image Super-Resolution [2.0571256241341924]
This paper introduces a new lightweight super-resolution model based on an efficient method for residual feature and attention aggregation. Our proposed architecture surpasses state-of-the-art performance in several datasets, while maintaining relatively low computation and memory footprint.
arXiv Detail & Related papers (2020-12-08T17:24:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.