Parameter-Efficient Transformer with Hybrid Axial-Attention for Medical
Image Segmentation
- URL: http://arxiv.org/abs/2211.09533v1
- Date: Thu, 17 Nov 2022 13:54:55 GMT
- Title: Parameter-Efficient Transformer with Hybrid Axial-Attention for Medical
Image Segmentation
- Authors: Yiyue Hu and Lei Zhang and Nan Mu and Lei Liu
- Abstract summary: We propose a parameter-efficient transformer to explore intrinsic inductive bias via position information for medical image segmentation.
Motivated by this, we present a novel Hybrid Axial-Attention (HAA) that can be equipped with spatial pixel-wise information and relative position information as inductive bias.
- Score: 10.441315305453504
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformers have achieved remarkable success in medical image analysis owing
to their powerful capability to use flexible self-attention mechanism. However,
due to lacking intrinsic inductive bias in modeling visual structural
information, they generally require a large-scale pre-training schedule,
limiting the clinical applications over expensive small-scale medical data. To
this end, we propose a parameter-efficient transformer to explore intrinsic
inductive bias via position information for medical image segmentation.
Specifically, we empirically investigate how different position encoding
strategies affect the prediction quality of the region of interest (ROI), and
observe that ROIs are sensitive to the position encoding strategies. Motivated
by this, we present a novel Hybrid Axial-Attention (HAA), a form of position
self-attention that can be equipped with spatial pixel-wise information and
relative position information as inductive bias. Moreover, we introduce a
gating mechanism to alleviate the burden of training schedule, resulting in
efficient feature selection over small-scale datasets. Experiments on the BraTS
and Covid19 datasets prove the superiority of our method over the baseline and
previous works. Internal workflow visualization with interpretability is
conducted to better validate our success.
Related papers
- Boosting Medical Image Segmentation Performance with Adaptive Convolution Layer [6.887244952811574]
We propose an adaptive layer placed ahead of leading deep-learning models such as UCTransNet.
Our approach enhances the network's ability to handle diverse anatomical structures and subtle image details.
It consistently outperforms traditional CNNs with fixed kernel sizes with a similar number of parameters.
arXiv Detail & Related papers (2024-04-17T13:18:39Z) - CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images.
The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism.
We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z) - Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features.
We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv Detail & Related papers (2024-03-19T03:59:14Z) - Explainable Transformer Prototypes for Medical Diagnoses [7.680878119988482]
Self-attention feature of transformers contributes towards identifying crucial regions during the classification process.
Our research endeavors to innovate a unique attention block that underscores the correlation between'regions' rather than 'pixels'
A combined quantitative and qualitative methodological approach was used to demonstrate the effectiveness of the proposed method on a large-scale NIH chest X-ray dataset.
arXiv Detail & Related papers (2024-03-11T17:46:21Z) - SeUNet-Trans: A Simple yet Effective UNet-Transformer Model for Medical
Image Segmentation [0.0]
We propose a simple yet effective UNet-Transformer (seUNet-Trans) model for medical image segmentation.
In our approach, the UNet model is designed as a feature extractor to generate multiple feature maps from the input images.
By leveraging the UNet architecture and the self-attention mechanism, our model not only retains the preservation of both local and global context information but also is capable of capturing long-range dependencies between input elements.
arXiv Detail & Related papers (2023-10-16T01:13:38Z) - Attentive Symmetric Autoencoder for Brain MRI Segmentation [56.02577247523737]
We propose a novel Attentive Symmetric Auto-encoder based on Vision Transformer (ViT) for 3D brain MRI segmentation tasks.
In the pre-training stage, the proposed auto-encoder pays more attention to reconstruct the informative patches according to the gradient metrics.
Experimental results show that our proposed attentive symmetric auto-encoder outperforms the state-of-the-art self-supervised learning methods and medical image segmentation models.
arXiv Detail & Related papers (2022-09-19T09:43:19Z) - Transformer Lesion Tracker [12.066026343488453]
We propose a transformer-based approach, termed Transformer Lesion Tracker (TLT)
We design a Cross Attention-based Transformer (CAT) to capture and combine both global and local information to enhance feature extraction.
We conduct experiments on a public dataset to show the superiority of our method and find that our model performance has improved the average Euclidean center error by at least 14.3%.
arXiv Detail & Related papers (2022-06-13T15:35:24Z) - Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
We propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG)
CGT injects clinical relation triples into the visual features as prior knowledge to drive the decoding procedure.
Experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods.
arXiv Detail & Related papers (2022-06-04T13:16:30Z) - Interactive Medical Image Segmentation with Self-Adaptive Confidence
Calibration [10.297081695050457]
This paper proposes an interactive segmentation framework, called interactive MEdical segmentation with self-adaptive Confidence CAlibration (MECCA)
The evaluation is established through a novel action-based confidence network, and the corrective actions are obtained from MARL.
Experimental results on various medical image datasets have shown the significant performance of the proposed algorithm.
arXiv Detail & Related papers (2021-11-15T12:38:56Z) - Domain Adaptive Robotic Gesture Recognition with Unsupervised
Kinematic-Visual Data Alignment [60.31418655784291]
We propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot.
It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture.
Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.
arXiv Detail & Related papers (2021-03-06T09:10:03Z) - Medical Transformer: Gated Axial-Attention for Medical Image
Segmentation [73.98974074534497]
We study the feasibility of using Transformer-based network architectures for medical image segmentation tasks.
We propose a Gated Axial-Attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module.
To train the model effectively on medical images, we propose a Local-Global training strategy (LoGo) which further improves the performance.
arXiv Detail & Related papers (2021-02-21T18:35:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.