BATFormer: Towards Boundary-Aware Lightweight Transformer for Efficient
Medical Image Segmentation
- URL: http://arxiv.org/abs/2206.14409v3
- Date: Wed, 19 Apr 2023 02:43:34 GMT
- Title: BATFormer: Towards Boundary-Aware Lightweight Transformer for Efficient
Medical Image Segmentation
- Authors: Xian Lin, Li Yu, Kwang-Ting Cheng, and Zengqiang Yan
- Abstract summary: We propose a boundary-aware lightweight transformer (BATFormer) that can build cross-scale global interaction with lower computational complexity.
BATFormer achieves the best performance in Dice of 92.84%, 91.97%, 90.26%, and 96.30% for the average, right ventricle, myocardium, and left ventricle respectively.
- Score: 26.405243756778606
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Objective: Transformers, born to remedy the inadequate receptive fields of
CNNs, have drawn explosive attention recently. However, the daunting
computational complexity of global representation learning, together with rigid
window partitioning, hinders their deployment in medical image segmentation.
This work aims to address the above two issues in transformers for better
medical image segmentation. Methods: We propose a boundary-aware lightweight
transformer (BATFormer) that can build cross-scale global interaction with
lower computational complexity and generate windows flexibly under the guidance
of entropy. Specifically, to fully explore the benefits of transformers in
long-range dependency establishment, a cross-scale global transformer (CGT)
module is introduced to jointly utilize multiple small-scale feature maps for
richer global features with lower computational complexity. Given the
importance of shape modeling in medical image segmentation, a boundary-aware
local transformer (BLT) module is constructed. Different from rigid window
partitioning in vanilla transformers which would produce boundary distortion,
BLT adopts an adaptive window partitioning scheme under the guidance of entropy
for both computational complexity reduction and shape preservation. Results:
BATFormer achieves the best performance in Dice of 92.84%, 91.97%, 90.26%, and
96.30% for the average, right ventricle, myocardium, and left ventricle
respectively on the ACDC dataset and the best performance in Dice, IoU, and ACC
of 90.76%, 84.64%, and 96.76% respectively on the ISIC 2018 dataset. More
importantly, BATFormer requires the least amount of model parameters and the
lowest computational complexity compared to the state-of-the-art approaches.
Conclusion and Significance: Our results demonstrate the necessity of
developing customized transformers for efficient and better medical image
segmentation.
Related papers
- RefineFormer3D: Efficient 3D Medical Image Segmentation via Adaptive Multi-Scale Transformer with Cross Attention Fusion [6.372261626436676]
RefineFormer3D is a lightweight hierarchical transformer architecture that balances segmentation accuracy and computational efficiency for medical imaging.<n>The model achieves fast inference (8.35 ms per volume on GPU) with low memory requirements, supporting deployment in resource-constrained clinical environments.
arXiv Detail & Related papers (2026-02-18T09:58:59Z) - When Swin Transformer Meets KANs: An Improved Transformer Architecture for Medical Image Segmentation [10.656996937993199]
We introduce UKAST, a U-Net like architecture that integrates rational-function based Kolmogorov-Arnold Networks (KANs) into Swin Transformer encoders.<n>UKAST achieves state-of-the-art performance on four diverse 2D and 3D medical image segmentation benchmarks.
arXiv Detail & Related papers (2025-11-06T05:44:57Z) - FractMorph: A Fractional Fourier-Based Multi-Domain Transformer for Deformable Image Registration [0.6683923149620578]
We present FractMorph, a novel 3D dual-parallel transformer-based architecture that enhances cross-image feature matching.<n>A lightweight U-Net style network then predicts a dense deformation field from the transformer-enriched features.<n>Results show FractMorph achieves state-of-the-art performance with an overall Dice Similarity Coefficient (DSC) of $86.45%$, an average per-structure of $75.15%$, and a 95th-percentile Hausdorff distance (HD95) of $1.54mathrmmm$ on our data split.
arXiv Detail & Related papers (2025-08-17T17:42:10Z) - PiT: Progressive Diffusion Transformer [50.46345527963736]
We propose a series of Pseudo textbfProgressive Dtextbfiffusion textbfTransformer (textbfPiT)<n>Our proposed PiT-L achieves 54%$uparrow$ FID improvement over DiT-XL/2 while using less computation.
arXiv Detail & Related papers (2025-05-19T15:02:33Z) - GLoG-CSUnet: Enhancing Vision Transformers with Adaptable Radiomic Features for Medical Image Segmentation [2.294915015129229]
Vision Transformers (ViTs) have shown promise in medical image semantic segmentation (MISS)
We introduce Gabor and Laplacian of Gaussian Convolutional Swin Network (GLoG-CSUnet)
GLoG-CSUnet is a novel architecture enhancing Transformer-based models by incorporating learnable radiomic features.
arXiv Detail & Related papers (2025-01-06T06:07:40Z) - TransUKAN:Computing-Efficient Hybrid KAN-Transformer for Enhanced Medical Image Segmentation [5.280523424712006]
U-Net is currently the most widely used architecture for medical image segmentation.
We have improved the KAN to reduce memory usage and computational load.
This approach enhances the model's capability to capture nonlinear relationships.
arXiv Detail & Related papers (2024-09-23T02:52:49Z) - SegStitch: Multidimensional Transformer for Robust and Efficient Medical Imaging Segmentation [15.811141677039224]
State-of-the-art methods, particularly those utilizing transformers, have been prominently adopted in 3D semantic segmentation.
However, plain vision transformers encounter challenges due to their neglect of local features and their high computational complexity.
We propose SegStitch, an innovative architecture that integrates transformers with denoising ODE blocks.
arXiv Detail & Related papers (2024-08-01T12:05:02Z) - DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition [62.95223898214866]
We explore effective Vision Transformers to pursue a preferable trade-off between the computational complexity and size of the attended receptive field.
With a pyramid architecture, we construct a Multi-Scale Dilated Transformer (DilateFormer) by stacking MSDA blocks at low-level stages and global multi-head self-attention blocks at high-level stages.
Our experiment results show that our DilateFormer achieves state-of-the-art performance on various vision tasks.
arXiv Detail & Related papers (2023-02-03T14:59:31Z) - Optimizing Vision Transformers for Medical Image Segmentation and
Few-Shot Domain Adaptation [11.690799827071606]
We propose Convolutional Swin-Unet (CS-Unet) transformer blocks and optimise their settings with relation to patch embedding, projection, the feed-forward network, up sampling and skip connections.
CS-Unet can be trained from scratch and inherits the superiority of convolutions in each feature process phase.
Experiments show that CS-Unet without pre-training surpasses other state-of-the-art counterparts by large margins on two medical CT and MRI datasets with fewer parameters.
arXiv Detail & Related papers (2022-10-14T19:18:52Z) - The Lighter The Better: Rethinking Transformers in Medical Image
Segmentation Through Adaptive Pruning [26.405243756778606]
We propose to employ adaptive pruning to transformers for medical image segmentation and propose a lightweight network APFormer.
To our best knowledge, this is the first work on transformer pruning for medical image analysis tasks.
We prove, through ablation studies, that adaptive pruning can work as a plug-n-play module for performance improvement on other hybrid-/transformer-based methods.
arXiv Detail & Related papers (2022-06-29T05:49:36Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result.
Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z) - nnFormer: Interleaved Transformer for Volumetric Segmentation [50.10441845967601]
We introduce nnFormer, a powerful segmentation model with an interleaved architecture based on empirical combination of self-attention and convolution.
nnFormer achieves tremendous improvements over previous transformer-based methods on two commonly used datasets Synapse and ACDC.
arXiv Detail & Related papers (2021-09-07T17:08:24Z) - CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image
Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.
We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z) - Medical Transformer: Gated Axial-Attention for Medical Image
Segmentation [73.98974074534497]
We study the feasibility of using Transformer-based network architectures for medical image segmentation tasks.
We propose a Gated Axial-Attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module.
To train the model effectively on medical images, we propose a Local-Global training strategy (LoGo) which further improves the performance.
arXiv Detail & Related papers (2021-02-21T18:35:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.