SwinVFTR: A Novel Volumetric Feature-learning Transformer for 3D OCT Fluid Segmentation
- URL: http://arxiv.org/abs/2303.09233v3
- Date: Thu, 02 Jan 2025 22:11:57 GMT
- Title: SwinVFTR: A Novel Volumetric Feature-learning Transformer for 3D OCT Fluid Segmentation
- Authors: Khondker Fariha Hossain, Sharif Amit Kamran, Alireza Tavakkoli, George Bebis, Sal Baker,
- Abstract summary: We propose SwinVFTR, a transformer architecture for precise fluid segmentation in 3D OCT images.<n>SwinVFTR employs channel-wise volumetric sampling and a shifted window transformer block to improve fluid localization.<n>It outperforms existing models on Spectralis, Cirrus, and Topcon OCT datasets.
- Score: 0.23967405016776386
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Accurately segmenting fluid in 3D optical coherence tomography (OCT) images is critical for detecting eye diseases but remains challenging. Traditional autoencoder-based methods struggle with resolution loss and information recovery. While transformer-based models improve segmentation, they arent optimized for 3D OCT volumes, which vary by vendor and extraction technique. To address this, we propose SwinVFTR, a transformer architecture for precise fluid segmentation in 3D OCT images. SwinVFTR employs channel-wise volumetric sampling and a shifted window transformer block to improve fluid localization. Moreover, a novel volumetric attention block enhances spatial and depth-wise attention. Trained using multi-class dice loss, SwinVFTR outperforms existing models on Spectralis, Cirrus, and Topcon OCT datasets, achieving mean dice scores of 0.72, 0.59, and 0.68, respectively, along with superior performance in mean intersection-over-union (IOU) and structural similarity (SSIM) metrics.
Related papers
- TransUNext: towards a more advanced U-shaped framework for automatic vessel segmentation in the fundus image [19.16680702780529]
We propose a more advanced U-shaped architecture for a hybrid Transformer and CNN: TransUNext.
The Global Multi-Scale Fusion (GMSF) module is further introduced to upgrade skip-connections, fuse high-level semantic and low-level detailed information, and eliminate high- and low-level semantic differences.
arXiv Detail & Related papers (2024-11-05T01:44:22Z) - Diff3Dformer: Leveraging Slice Sequence Diffusion for Enhanced 3D CT Classification with Transformer Networks [5.806035963947936]
We propose a Diffusion-based 3D Vision Transformer (Diff3Dformer) to aggregate repetitive information within 3D CT scans.
Our method exhibits improved performance on two different scales of small datasets of 3D lung CT scans.
arXiv Detail & Related papers (2024-06-24T23:23:18Z) - CVT-xRF: Contrastive In-Voxel Transformer for 3D Consistent Radiance Fields from Sparse Inputs [65.80187860906115]
We propose a novel approach to improve NeRF's performance with sparse inputs.
We first adopt a voxel-based ray sampling strategy to ensure that the sampled rays intersect with a certain voxel in 3D space.
We then randomly sample additional points within the voxel and apply a Transformer to infer the properties of other points on each ray, which are then incorporated into the volume rendering.
arXiv Detail & Related papers (2024-03-25T15:56:17Z) - Enhancing Retinal Vascular Structure Segmentation in Images With a Novel
Design Two-Path Interactive Fusion Module Model [6.392575673488379]
We introduce Swin-Res-Net, a specialized module designed to enhance the precision of retinal vessel segmentation.
Swin-Res-Net utilizes the Swin transformer which uses shifted windows with displacement for partitioning.
Our proposed architecture produces outstanding results, either meeting or surpassing those of other published models.
arXiv Detail & Related papers (2024-03-03T01:36:11Z) - CIS-UNet: Multi-Class Segmentation of the Aorta in Computed Tomography
Angiography via Context-Aware Shifted Window Self-Attention [10.335899694123711]
We introduce Context Infused Swin-UNet (CIS-UNet), a deep learning model for aortic segmentation.
CIS-UNet adopts a hierarchical encoder-decoder structure comprising a CNN encoder, symmetric decoder, skip connections, and a novel Context-aware Shifted Window Self-Attention (CSW-SA) as the bottleneck block.
We trained our model on computed tomography (CT) scans from 44 patients and tested it on 15 patients. CIS-UNet outperformed the state-of-the-art SwinUNetR segmentation model, by achieving a superior mean Dice coefficient of 0.713 compared
arXiv Detail & Related papers (2024-01-23T19:17:20Z) - RetiFluidNet: A Self-Adaptive and Multi-Attention Deep Convolutional
Network for Retinal OCT Fluid Segmentation [3.57686754209902]
Quantification of retinal fluids is necessary for OCT-guided treatment management.
New convolutional neural architecture named RetiFluidNet is proposed for multi-class retinal fluid segmentation.
Model benefits from hierarchical representation learning of textural, contextual, and edge features.
arXiv Detail & Related papers (2022-09-26T07:18:00Z) - View-Disentangled Transformer for Brain Lesion Detection [50.4918615815066]
We propose a novel view-disentangled transformer to enhance the extraction of MRI features for more accurate tumour detection.
First, the proposed transformer harvests long-range correlation among different positions in a 3D brain scan.
Second, the transformer models a stack of slice features as multiple 2D views and enhance these features view-by-view.
Third, we deploy the proposed transformer module in a transformer backbone, which can effectively detect the 2D regions surrounding brain lesions.
arXiv Detail & Related papers (2022-09-20T11:58:23Z) - Focused Decoding Enables 3D Anatomical Detection by Transformers [64.36530874341666]
We propose a novel Detection Transformer for 3D anatomical structure detection, dubbed Focused Decoder.
Focused Decoder leverages information from an anatomical region atlas to simultaneously deploy query anchors and restrict the cross-attention's field of view.
We evaluate our proposed approach on two publicly available CT datasets and demonstrate that Focused Decoder not only provides strong detection results and thus alleviates the need for a vast amount of annotated data but also exhibits exceptional and highly intuitive explainability of results via attention weights.
arXiv Detail & Related papers (2022-07-21T22:17:21Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - Unsupervised Contrastive Learning based Transformer for Lung Nodule
Detection [6.693379403133435]
Early detection of lung nodules with computed tomography (CT) is critical for the longer survival of lung cancer patients and better quality of life.
Computer-aided detection/diagnosis (CAD) is proven valuable as a second or concurrent reader in this context.
accurate detection of lung nodules remains a challenge for such CAD systems and even radiologists due to variability in size, location, and appearance of lung nodules.
Motivated by recent computer vision techniques, here we present a self-supervised region-based 3D transformer model to identify lung nodules.
arXiv Detail & Related papers (2022-04-30T01:19:00Z) - The KFIoU Loss for Rotated Object Detection [115.334070064346]
In this paper, we argue that one effective alternative is to devise an approximate loss who can achieve trend-level alignment with SkewIoU loss.
Specifically, we model the objects as Gaussian distribution and adopt Kalman filter to inherently mimic the mechanism of SkewIoU.
The resulting new loss called KFIoU is easier to implement and works better compared with exact SkewIoU.
arXiv Detail & Related papers (2022-01-29T10:54:57Z) - AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation [19.53151547706724]
transformer-based models have drawn attention to exploring these techniques in medical image segmentation.
We propose Axial Fusion Transformer UNet (AFTer-UNet), which takes both advantages of convolutional layers' capability of extracting detailed features and transformers' strength on long sequence modeling.
It has fewer parameters and takes less GPU memory to train than the previous transformer-based models.
arXiv Detail & Related papers (2021-10-20T06:47:28Z) - CyTran: A Cycle-Consistent Transformer with Multi-Level Consistency for
Non-Contrast to Contrast CT Translation [56.622832383316215]
We propose a novel approach to translate unpaired contrast computed tomography (CT) scans to non-contrast CT scans.
Our approach is based on cycle-consistent generative adversarial convolutional transformers, for short, CyTran.
Our empirical results show that CyTran outperforms all competing methods.
arXiv Detail & Related papers (2021-10-12T23:25:03Z) - LIFE: A Generalizable Autodidactic Pipeline for 3D OCT-A Vessel
Segmentation [5.457168581192045]
Recent deep learning algorithms produced promising vascular segmentation results.
However, 3D retinal vessel segmentation remains difficult due to the lack of manually annotated training data.
We propose a learning-based method that is only supervised by a self-synthesized modality.
arXiv Detail & Related papers (2021-07-09T07:51:33Z) - Atrous Residual Interconnected Encoder to Attention Decoder Framework
for Vertebrae Segmentation via 3D Volumetric CT Images [1.8146155083014204]
This paper proposes a novel algorithm for automated vertebrae segmentation via 3D volumetric spine CT images.
The proposed model is based on the structure of encoder to decoder, using layer normalization to optimize mini-batch training performance.
The experimental results show that our model achieves competitive performance compared with other state-of-the-art medical semantic segmentation methods.
arXiv Detail & Related papers (2021-04-08T12:09:16Z) - CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image
Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.
We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z) - Revisiting 3D Context Modeling with Supervised Pre-training for
Universal Lesion Detection in CT Slices [48.85784310158493]
We propose a Modified Pseudo-3D Feature Pyramid Network (MP3D FPN) to efficiently extract 3D context enhanced 2D features for universal lesion detection in CT slices.
With the novel pre-training method, the proposed MP3D FPN achieves state-of-the-art detection performance on the DeepLesion dataset.
The proposed 3D pre-trained weights can potentially be used to boost the performance of other 3D medical image analysis tasks.
arXiv Detail & Related papers (2020-12-16T07:11:16Z) - Weakly-supervised Learning For Catheter Segmentation in 3D Frustum
Ultrasound [74.22397862400177]
We propose a novel Frustum ultrasound based catheter segmentation method.
The proposed method achieved the state-of-the-art performance with an efficiency of 0.25 second per volume.
arXiv Detail & Related papers (2020-10-19T13:56:22Z) - 4D Spatio-Temporal Convolutional Networks for Object Position Estimation
in OCT Volumes [69.62333053044712]
3D convolutional neural networks (CNNs) have shown promising performance for pose estimation of a marker object using single OCT images.
We extend 3D CNNs to 4D-temporal CNNs to evaluate the impact of additional temporal information for marker object tracking.
arXiv Detail & Related papers (2020-07-02T12:02:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.