FastSAM3D: An Efficient Segment Anything Model for 3D Volumetric Medical Images
- URL: http://arxiv.org/abs/2403.09827v1
- Date: Thu, 14 Mar 2024 19:29:44 GMT
- Title: FastSAM3D: An Efficient Segment Anything Model for 3D Volumetric Medical Images
- Authors: Yiqing Shen, Jingxing Li, Xinyuan Shao, Blanca Inigo Romillo, Ankush Jindal, David Dreizin, Mathias Unberath,
- Abstract summary: We present FastSAM3D, which accelerates SAM inference to 8 milliseconds per 128*128*128 3D volumetric image on an NVIDIA A100 GPU.
FastSAM3D achieves a remarkable speedup of 527.38x compared to 2D SAMs and 8.75x compared to 3D SAMs on the same volumes without significant performance decline.
- Score: 7.2993352400518035
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Segment anything models (SAMs) are gaining attention for their zero-shot generalization capability in segmenting objects of unseen classes and in unseen domains when properly prompted. Interactivity is a key strength of SAMs, allowing users to iteratively provide prompts that specify objects of interest to refine outputs. However, to realize the interactive use of SAMs for 3D medical imaging tasks, rapid inference times are necessary. High memory requirements and long processing delays remain constraints that hinder the adoption of SAMs for this purpose. Specifically, while 2D SAMs applied to 3D volumes contend with repetitive computation to process all slices independently, 3D SAMs suffer from an exponential increase in model parameters and FLOPS. To address these challenges, we present FastSAM3D which accelerates SAM inference to 8 milliseconds per 128*128*128 3D volumetric image on an NVIDIA A100 GPU. This speedup is accomplished through 1) a novel layer-wise progressive distillation scheme that enables knowledge transfer from a complex 12-layer ViT-B to a lightweight 6-layer ViT-Tiny variant encoder without training from scratch; and 2) a novel 3D sparse flash attention to replace vanilla attention operators, substantially reducing memory needs and improving parallelization. Experiments on three diverse datasets reveal that FastSAM3D achieves a remarkable speedup of 527.38x compared to 2D SAMs and 8.75x compared to 3D SAMs on the same volumes without significant performance decline. Thus, FastSAM3D opens the door for low-cost truly interactive SAM-based 3D medical imaging segmentation with commonly used GPU hardware. Code is available at https://github.com/arcadelab/FastSAM3D.
Related papers
- EdgeTAM: On-Device Track Anything Model [65.10032957471824]
Segment Anything Model (SAM) 2 further extends its capability from image to video inputs through a memory bank mechanism.
We aim at making SAM 2 much more efficient so that it even runs on mobile devices while maintaining a comparable performance.
We propose EdgeTAM, which leverages a novel 2D Spatial Perceiver to reduce the computational cost.
arXiv Detail & Related papers (2025-01-13T12:11:07Z) - Memorizing SAM: 3D Medical Segment Anything Model with Memorizing Transformer [8.973249762345793]
We propose Memorizing SAM, a novel 3D SAM architecture incorporating a memory Transformer as a plug-in.
Unlike conventional memorizing Transformers that save the internal representation during training or inference, our Memorizing SAM utilizes existing highly accurate internal representation as the memory source.
We evaluate the performance of Memorizing SAM in 33 categories from the TotalSegmentator dataset, which indicates that Memorizing SAM can outperform state-of-the-art 3D SAM variant i.e., FastSAM3D with an average Dice increase of 11.36% at the cost of only 4.38 millisecond increase in inference time.
arXiv Detail & Related papers (2024-12-18T14:51:25Z) - Lightweight Method for Interactive 3D Medical Image Segmentation with Multi-Round Result Fusion [7.158573385931718]
Segment Anything Model (SAM) has drawn widespread attention due to its zero-shot generalization capabilities in interactive segmentation.
We propose Lightweight Interactive Network for 3D Medical Image (LIM-Net) as a novel approach demonstrating the potential of compact CNN-based models.
LIM-Net initiates segmentation by generating a 2D prompt mask from user hints.
It exhibits stronger generalization to unseen data compared to SAM-based models, with competitive accuracy while requiring fewer interactions.
arXiv Detail & Related papers (2024-12-11T11:52:16Z) - SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners [87.76470518069338]
We introduce SAM2Point, a preliminary exploration adapting Segment Anything Model 2 (SAM 2) for promptable 3D segmentation.
Our framework supports various prompt types, including 3D points, boxes, and masks, and can generalize across diverse scenarios, such as 3D objects, indoor scenes, sparse outdoor environments, and raw LiDAR.
To our best knowledge, we present the most faithful implementation of SAM in 3D, which may serve as a starting point for future research in promptable 3D segmentation.
arXiv Detail & Related papers (2024-08-29T17:59:45Z) - EmbodiedSAM: Online Segment Any 3D Thing in Real Time [61.2321497708998]
Embodied tasks require the agent to fully understand 3D scenes simultaneously with its exploration.
An online, real-time, fine-grained and highly-generalized 3D perception model is desperately needed.
arXiv Detail & Related papers (2024-08-21T17:57:06Z) - Towards a Comprehensive, Efficient and Promptable Anatomic Structure Segmentation Model using 3D Whole-body CT Scans [23.573958232965104]
Segment anything model (SAM) demonstrates strong generalization ability on natural image segmentation.
We propose a comprehensive and scalable 3D SAM model for whole-body CT segmentation, named CT-SAM3D.
CT-SAM3D is trained using a curated dataset of 1204 CT scans containing 107 whole-body anatomies.
arXiv Detail & Related papers (2024-03-22T09:40:52Z) - SAM-Lightening: A Lightweight Segment Anything Model with Dilated Flash Attention to Achieve 30 times Acceleration [6.515075311704396]
Segment Anything Model (SAM) has garnered significant attention in segmentation tasks due to their zero-shot generalization ability.
We introduce SAM-Lightening, a variant of SAM, that features a re-engineered attention mechanism, termed Dilated Flash Attention.
Experiments on COCO and LVIS reveal that SAM-Lightening significantly outperforms the state-of-the-art methods in both run-time efficiency and segmentation accuracy.
arXiv Detail & Related papers (2024-03-14T09:07:34Z) - TinySAM: Pushing the Envelope for Efficient Segment Anything Model [73.06322749886483]
We propose a framework to obtain a tiny segment anything model (TinySAM) while maintaining the strong zero-shot performance.
With all these proposed methods, our TinySAM leads to orders of magnitude computational reduction and pushes the envelope for efficient segment anything task.
arXiv Detail & Related papers (2023-12-21T12:26:11Z) - RepViT-SAM: Towards Real-Time Segmenting Anything [71.94042743317937]
Segment Anything Model (SAM) has shown impressive zero-shot transfer performance for various computer vision tasks.
MobileSAM proposes to replace the heavyweight image encoder in SAM with TinyViT by employing distillation.
RepViT-SAM can enjoy significantly better zero-shot transfer capability than MobileSAM, along with nearly $10times$ faster inference speed.
arXiv Detail & Related papers (2023-12-10T04:42:56Z) - SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Instance Segmentation [24.733049281032272]
We introduce SAMPro3D for zero-shot instance segmentation of 3D scenes.
Our approach segments 3D instances by applying the pretrained Segment Anything Model (SAM) to 2D frames.
Our method achieves comparable or better performance compared to previous zero-shot or fully supervised approaches.
arXiv Detail & Related papers (2023-11-29T15:11:03Z) - TomoSAM: a 3D Slicer extension using SAM for tomography segmentation [62.997667081978825]
TomoSAM has been developed to integrate the cutting-edge Segment Anything Model (SAM) into 3D Slicer.
SAM is a promptable deep learning model that is able to identify objects and create image masks in a zero-shot manner.
The synergy between these tools aids in the segmentation of complex 3D datasets from tomography or other imaging techniques.
arXiv Detail & Related papers (2023-06-14T16:13:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.