SwinVFTR: A Novel Volumetric Feature-learning Transformer for 3D OCT
Fluid Segmentation
- URL: http://arxiv.org/abs/2303.09233v2
- Date: Fri, 17 Mar 2023 04:50:05 GMT
- Title: SwinVFTR: A Novel Volumetric Feature-learning Transformer for 3D OCT
Fluid Segmentation
- Authors: Sharif Amit Kamran, Khondker Fariha Hossain, Alireza Tavakkoli, Salah
A. Baker, Stewart Lee Zuckerbrod
- Abstract summary: We propose SwinVFTR, a new transformer-based architecture for precise fluid segmentation in 3D volumetric OCT images.
We first utilize a channel-wise volumetric sampling for training on OCT volumes with varying depths (B-scans)
Next, the model uses a novel shifted window transformer block in the encoder to achieve better localization and segmentation of fluid regions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Accurately segmenting fluid in 3D volumetric optical coherence tomography
(OCT) images is a crucial yet challenging task for detecting eye diseases.
Traditional autoencoding-based segmentation approaches have limitations in
extracting fluid regions due to successive resolution loss in the encoding
phase and the inability to recover lost information in the decoding phase.
Although current transformer-based models for medical image segmentation
addresses this limitation, they are not designed to be applied out-of-the-box
for 3D OCT volumes, which have a wide-ranging channel-axis size based on
different vendor device and extraction technique. To address these issues, we
propose SwinVFTR, a new transformer-based architecture designed for precise
fluid segmentation in 3D volumetric OCT images. We first utilize a channel-wise
volumetric sampling for training on OCT volumes with varying depths (B-scans).
Next, the model uses a novel shifted window transformer block in the encoder to
achieve better localization and segmentation of fluid regions. Additionally, we
propose a new volumetric attention block for spatial and depth-wise attention,
which improves upon traditional residual skip connections. Consequently,
utilizing multi-class dice loss, the proposed architecture outperforms other
existing architectures on the three publicly available vendor-specific OCT
datasets, namely Spectralis, Cirrus, and Topcon, with mean dice scores of 0.72,
0.59, and 0.68, respectively. Additionally, SwinVFTR outperforms other
architectures in two additional relevant metrics, mean intersection-over-union
(Mean-IOU) and structural similarity measure (SSIM).
Related papers
- Enhancing Retinal Vascular Structure Segmentation in Images With a Novel
Design Two-Path Interactive Fusion Module Model [6.392575673488379]
We introduce Swin-Res-Net, a specialized module designed to enhance the precision of retinal vessel segmentation.
Swin-Res-Net utilizes the Swin transformer which uses shifted windows with displacement for partitioning.
Our proposed architecture produces outstanding results, either meeting or surpassing those of other published models.
arXiv Detail & Related papers (2024-03-03T01:36:11Z) - RetiFluidNet: A Self-Adaptive and Multi-Attention Deep Convolutional
Network for Retinal OCT Fluid Segmentation [3.57686754209902]
Quantification of retinal fluids is necessary for OCT-guided treatment management.
New convolutional neural architecture named RetiFluidNet is proposed for multi-class retinal fluid segmentation.
Model benefits from hierarchical representation learning of textural, contextual, and edge features.
arXiv Detail & Related papers (2022-09-26T07:18:00Z) - Focused Decoding Enables 3D Anatomical Detection by Transformers [64.36530874341666]
We propose a novel Detection Transformer for 3D anatomical structure detection, dubbed Focused Decoder.
Focused Decoder leverages information from an anatomical region atlas to simultaneously deploy query anchors and restrict the cross-attention's field of view.
We evaluate our proposed approach on two publicly available CT datasets and demonstrate that Focused Decoder not only provides strong detection results and thus alleviates the need for a vast amount of annotated data but also exhibits exceptional and highly intuitive explainability of results via attention weights.
arXiv Detail & Related papers (2022-07-21T22:17:21Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - Dynamic Linear Transformer for 3D Biomedical Image Segmentation [2.440109381823186]
Transformer-based neural networks have surpassed promising performance on many biomedical image segmentation tasks.
Main challenge for 3D transformer-based segmentation methods is the quadratic complexity introduced by the self-attention mechanism.
We propose a novel transformer architecture for 3D medical image segmentation using an encoder-decoder style architecture with linear complexity.
arXiv Detail & Related papers (2022-06-01T21:15:01Z) - The KFIoU Loss for Rotated Object Detection [115.334070064346]
In this paper, we argue that one effective alternative is to devise an approximate loss who can achieve trend-level alignment with SkewIoU loss.
Specifically, we model the objects as Gaussian distribution and adopt Kalman filter to inherently mimic the mechanism of SkewIoU.
The resulting new loss called KFIoU is easier to implement and works better compared with exact SkewIoU.
arXiv Detail & Related papers (2022-01-29T10:54:57Z) - AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation [19.53151547706724]
transformer-based models have drawn attention to exploring these techniques in medical image segmentation.
We propose Axial Fusion Transformer UNet (AFTer-UNet), which takes both advantages of convolutional layers' capability of extracting detailed features and transformers' strength on long sequence modeling.
It has fewer parameters and takes less GPU memory to train than the previous transformer-based models.
arXiv Detail & Related papers (2021-10-20T06:47:28Z) - Atrous Residual Interconnected Encoder to Attention Decoder Framework
for Vertebrae Segmentation via 3D Volumetric CT Images [1.8146155083014204]
This paper proposes a novel algorithm for automated vertebrae segmentation via 3D volumetric spine CT images.
The proposed model is based on the structure of encoder to decoder, using layer normalization to optimize mini-batch training performance.
The experimental results show that our model achieves competitive performance compared with other state-of-the-art medical semantic segmentation methods.
arXiv Detail & Related papers (2021-04-08T12:09:16Z) - CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image
Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.
We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z) - Weakly-supervised Learning For Catheter Segmentation in 3D Frustum
Ultrasound [74.22397862400177]
We propose a novel Frustum ultrasound based catheter segmentation method.
The proposed method achieved the state-of-the-art performance with an efficiency of 0.25 second per volume.
arXiv Detail & Related papers (2020-10-19T13:56:22Z) - 4D Spatio-Temporal Convolutional Networks for Object Position Estimation
in OCT Volumes [69.62333053044712]
3D convolutional neural networks (CNNs) have shown promising performance for pose estimation of a marker object using single OCT images.
We extend 3D CNNs to 4D-temporal CNNs to evaluate the impact of additional temporal information for marker object tracking.
arXiv Detail & Related papers (2020-07-02T12:02:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.