UNesT: Local Spatial Representation Learning with Hierarchical
Transformer for Efficient Medical Segmentation
- URL: http://arxiv.org/abs/2209.14378v2
- Date: Fri, 8 Sep 2023 01:57:45 GMT
- Title: UNesT: Local Spatial Representation Learning with Hierarchical
Transformer for Efficient Medical Segmentation
- Authors: Xin Yu, Qi Yang, Yinchi Zhou, Leon Y. Cai, Riqiang Gao, Ho Hin Lee,
Thomas Li, Shunxing Bao, Zhoubing Xu, Thomas A. Lasko, Richard G. Abramson,
Zizhao Zhang, Yuankai Huo, Bennett A. Landman, Yucheng Tang
- Abstract summary: We show that UNesT consistently achieves state-of-the-art performance and evaluate its generalizability and data efficiency.
We show that UNesT consistently achieves state-of-the-art performance and evaluate its generalizability and data efficiency.
- Score: 29.287521185541298
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Transformer-based models, capable of learning better global dependencies,
have recently demonstrated exceptional representation learning capabilities in
computer vision and medical image analysis. Transformer reformats the image
into separate patches and realizes global communication via the self-attention
mechanism. However, positional information between patches is hard to preserve
in such 1D sequences, and loss of it can lead to sub-optimal performance when
dealing with large amounts of heterogeneous tissues of various sizes in 3D
medical image segmentation. Additionally, current methods are not robust and
efficient for heavy-duty medical segmentation tasks such as predicting a large
number of tissue classes or modeling globally inter-connected tissue
structures. To address such challenges and inspired by the nested hierarchical
structures in vision transformer, we proposed a novel 3D medical image
segmentation method (UNesT), employing a simplified and faster-converging
transformer encoder design that achieves local communication among spatially
adjacent patch sequences by aggregating them hierarchically. We extensively
validate our method on multiple challenging datasets, consisting of multiple
modalities, anatomies, and a wide range of tissue classes, including 133
structures in the brain, 14 organs in the abdomen, 4 hierarchical components in
the kidneys, inter-connected kidney tumors and brain tumors. We show that UNesT
consistently achieves state-of-the-art performance and evaluate its
generalizability and data efficiency. Particularly, the model achieves whole
brain segmentation task complete ROI with 133 tissue classes in a single
network, outperforming prior state-of-the-art method SLANT27 ensembled with 27
networks.
Related papers
- From Tissue Plane to Organ World: A Benchmark Dataset for Multimodal Biomedical Image Registration using Deep Co-Attention Networks [17.718448707146017]
Histology-to-organ registration poses an extra challenge, as any given histologic section can capture only a small portion of a human organ.
We create the ATOM benchmark dataset, sourced from diverse institutions, with the primary objective of transforming this challenge into a machine learning problem.
The performance of our RegisMCAN model demonstrates the potential of deep learning to accurately predict where a subregion extracted from an organ image was obtained from within the overall 3D volume.
arXiv Detail & Related papers (2024-06-06T14:21:15Z) - Leveraging Frequency Domain Learning in 3D Vessel Segmentation [50.54833091336862]
In this study, we leverage Fourier domain learning as a substitute for multi-scale convolutional kernels in 3D hierarchical segmentation models.
We show that our novel network achieves remarkable dice performance (84.37% on ASACA500 and 80.32% on ImageCAS) in tubular vessel segmentation tasks.
arXiv Detail & Related papers (2024-01-11T19:07:58Z) - BRAU-Net++: U-Shaped Hybrid CNN-Transformer Network for Medical Image Segmentation [11.986549780782724]
We propose a hybrid yet effective CNN-Transformer network, named BRAU-Net++, for an accurate medical image segmentation task.
Specifically, BRAU-Net++ uses bi-level routing attention as the core building block to design our u-shaped encoder-decoder structure.
Our proposed approach surpasses other state-of-the-art methods including its baseline: BRAU-Net.
arXiv Detail & Related papers (2024-01-01T10:49:09Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - SeUNet-Trans: A Simple yet Effective UNet-Transformer Model for Medical
Image Segmentation [0.0]
We propose a simple yet effective UNet-Transformer (seUNet-Trans) model for medical image segmentation.
In our approach, the UNet model is designed as a feature extractor to generate multiple feature maps from the input images.
By leveraging the UNet architecture and the self-attention mechanism, our model not only retains the preservation of both local and global context information but also is capable of capturing long-range dependencies between input elements.
arXiv Detail & Related papers (2023-10-16T01:13:38Z) - Learning from partially labeled data for multi-organ and tumor
segmentation [102.55303521877933]
We propose a Transformer based dynamic on-demand network (TransDoDNet) that learns to segment organs and tumors on multiple datasets.
A dynamic head enables the network to accomplish multiple segmentation tasks flexibly.
We create a large-scale partially labeled Multi-Organ and Tumor benchmark, termed MOTS, and demonstrate the superior performance of our TransDoDNet over other competitors.
arXiv Detail & Related papers (2022-11-13T13:03:09Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z) - Multi-organ Segmentation Network with Adversarial Performance Validator [10.775440368500416]
This paper introduces an adversarial performance validation network into a 2D-to-3D segmentation framework.
The proposed network converts the 2D-coarse result to 3D high-quality segmentation masks in a coarse-to-fine manner, allowing joint optimization to improve segmentation accuracy.
Experiments on the NIH pancreas segmentation dataset demonstrate the proposed network achieves state-of-the-art accuracy on small organ segmentation and outperforms the previous best.
arXiv Detail & Related papers (2022-04-16T18:00:29Z) - Efficient Medical Image Segmentation Based on Knowledge Distillation [30.857487609003197]
We propose an efficient architecture by distilling knowledge from well-trained medical image segmentation networks to train another lightweight network.
We also devise a novel distillation module tailored for medical image segmentation to transfer semantic region information from teacher to student network.
We demonstrate that a lightweight network distilled by our method has non-negligible value in the scenario which requires relatively high operating speed and low storage usage.
arXiv Detail & Related papers (2021-08-23T07:41:10Z) - Automatic size and pose homogenization with spatial transformer network
to improve and accelerate pediatric segmentation [51.916106055115755]
We propose a new CNN architecture that is pose and scale invariant thanks to the use of Spatial Transformer Network (STN)
Our architecture is composed of three sequential modules that are estimated together during training.
We test the proposed method in kidney and renal tumor segmentation on abdominal pediatric CT scanners.
arXiv Detail & Related papers (2021-07-06T14:50:03Z) - Medical Transformer: Gated Axial-Attention for Medical Image
Segmentation [73.98974074534497]
We study the feasibility of using Transformer-based network architectures for medical image segmentation tasks.
We propose a Gated Axial-Attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module.
To train the model effectively on medical images, we propose a Local-Global training strategy (LoGo) which further improves the performance.
arXiv Detail & Related papers (2021-02-21T18:35:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.