Towards Optimal Patch Size in Vision Transformers for Tumor Segmentation
- URL: http://arxiv.org/abs/2308.16598v1
- Date: Thu, 31 Aug 2023 09:57:27 GMT
- Title: Towards Optimal Patch Size in Vision Transformers for Tumor Segmentation
- Authors: Ramtin Mojtahedi, Mohammad Hamghalam, Richard K. G. Do, and Amber L.
Simpson
- Abstract summary: Detection of tumors in metastatic colorectal cancer (mCRC) plays an essential role in the early diagnosis and treatment of liver cancer.
Deep learning models backboned by fully convolutional neural networks (FCNNs) have become the dominant model for segmenting 3D computerized tomography (CT) scans.
Vision transformers have been introduced to solve FCNN's locality of receptive fields.
This paper proposes a technique to select the vision transformer's optimal input multi-resolution image patch size based on the average volume size of metastasis lesions.
- Score: 2.4540404783565433
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Detection of tumors in metastatic colorectal cancer (mCRC) plays an essential
role in the early diagnosis and treatment of liver cancer. Deep learning models
backboned by fully convolutional neural networks (FCNNs) have become the
dominant model for segmenting 3D computerized tomography (CT) scans. However,
since their convolution layers suffer from limited kernel size, they are not
able to capture long-range dependencies and global context. To tackle this
restriction, vision transformers have been introduced to solve FCNN's locality
of receptive fields. Although transformers can capture long-range features,
their segmentation performance decreases with various tumor sizes due to the
model sensitivity to the input patch size. While finding an optimal patch size
improves the performance of vision transformer-based models on segmentation
tasks, it is a time-consuming and challenging procedure. This paper proposes a
technique to select the vision transformer's optimal input multi-resolution
image patch size based on the average volume size of metastasis lesions. We
further validated our suggested framework using a transfer-learning technique,
demonstrating that the highest Dice similarity coefficient (DSC) performance
was obtained by pre-training on training data with a larger tumour volume using
the suggested ideal patch size and then training with a smaller one. We
experimentally evaluate this idea through pre-training our model on a
multi-resolution public dataset. Our model showed consistent and improved
results when applied to our private multi-resolution mCRC dataset with a
smaller average tumor volume. This study lays the groundwork for optimizing
semantic segmentation of small objects using vision transformers. The
implementation source code is available
at:https://github.com/Ramtin-Mojtahedi/OVTPS.
Related papers
- MBDRes-U-Net: Multi-Scale Lightweight Brain Tumor Segmentation Network [0.0]
This study proposes the MBDRes-U-Net model using the three-dimensional (3D) U-Net framework, which integrates multibranch residual blocks and fused attention into the model.
The computational burden of the model is reduced by the branch strategy, which effectively uses the rich local features in multimodal images.
arXiv Detail & Related papers (2024-11-04T09:03:43Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - SeUNet-Trans: A Simple yet Effective UNet-Transformer Model for Medical
Image Segmentation [0.0]
We propose a simple yet effective UNet-Transformer (seUNet-Trans) model for medical image segmentation.
In our approach, the UNet model is designed as a feature extractor to generate multiple feature maps from the input images.
By leveraging the UNet architecture and the self-attention mechanism, our model not only retains the preservation of both local and global context information but also is capable of capturing long-range dependencies between input elements.
arXiv Detail & Related papers (2023-10-16T01:13:38Z) - 3DSAM-adapter: Holistic adaptation of SAM from 2D to 3D for promptable tumor segmentation [52.699139151447945]
We propose a novel adaptation method for transferring the segment anything model (SAM) from 2D to 3D for promptable medical image segmentation.
Our model can outperform domain state-of-the-art medical image segmentation models on 3 out of 4 tasks, specifically by 8.25%, 29.87%, and 10.11% for kidney tumor, pancreas tumor, colon cancer segmentation, and achieve similar performance for liver tumor segmentation.
arXiv Detail & Related papers (2023-06-23T12:09:52Z) - SwinCross: Cross-modal Swin Transformer for Head-and-Neck Tumor
Segmentation in PET/CT Images [6.936329289469511]
Cross-Modal Swin Transformer (SwinCross) with cross-modal attention (CMA) module incorporated cross-modal feature extraction at multiple resolutions.
The proposed method is experimentally shown to outperform state-of-the-art transformer-based methods.
arXiv Detail & Related papers (2023-02-08T03:36:57Z) - Learning from partially labeled data for multi-organ and tumor
segmentation [102.55303521877933]
We propose a Transformer based dynamic on-demand network (TransDoDNet) that learns to segment organs and tumors on multiple datasets.
A dynamic head enables the network to accomplish multiple segmentation tasks flexibly.
We create a large-scale partially labeled Multi-Organ and Tumor benchmark, termed MOTS, and demonstrate the superior performance of our TransDoDNet over other competitors.
arXiv Detail & Related papers (2022-11-13T13:03:09Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors
in MRI Images [7.334185314342017]
We propose a novel segmentation model termed Swin UNEt TRansformers (Swin UNETR)
The model extracts features at five different resolutions by utilizing shifted windows for computing self-attention.
We have participated in BraTS 2021 segmentation challenge, and our proposed model ranks among the top-performing approaches in the validation phase.
arXiv Detail & Related papers (2022-01-04T18:01:34Z) - Automatic size and pose homogenization with spatial transformer network
to improve and accelerate pediatric segmentation [51.916106055115755]
We propose a new CNN architecture that is pose and scale invariant thanks to the use of Spatial Transformer Network (STN)
Our architecture is composed of three sequential modules that are estimated together during training.
We test the proposed method in kidney and renal tumor segmentation on abdominal pediatric CT scanners.
arXiv Detail & Related papers (2021-07-06T14:50:03Z) - CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image
Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.
We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z) - Spherical coordinates transformation pre-processing in Deep Convolution
Neural Networks for brain tumor segmentation in MRI [0.0]
Deep Convolutional Neural Networks (DCNN) have recently shown very promising results.
DCNN models need large annotated datasets to achieve good performance.
In this work, a 3D Spherical coordinates transform has been hypothesized to improve DCNN models' accuracy.
arXiv Detail & Related papers (2020-08-17T05:11:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.