Related papers: FastSAM3D: An Efficient Segment Anything Model for 3D Volumetric Medical Images

FastSAM3D: An Efficient Segment Anything Model for 3D Volumetric Medical Images

URL: http://arxiv.org/abs/2403.09827v1
Date: Thu, 14 Mar 2024 19:29:44 GMT
Title: FastSAM3D: An Efficient Segment Anything Model for 3D Volumetric Medical Images
Authors: Yiqing Shen, Jingxing Li, Xinyuan Shao, Blanca Inigo Romillo, Ankush Jindal, David Dreizin, Mathias Unberath,
Abstract summary: We present FastSAM3D, which accelerates SAM inference to 8 milliseconds per 128*128*128 3D volumetric image on an NVIDIA A100 GPU. FastSAM3D achieves a remarkable speedup of 527.38x compared to 2D SAMs and 8.75x compared to 3D SAMs on the same volumes without significant performance decline.
Score: 7.2993352400518035
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Segment anything models (SAMs) are gaining attention for their zero-shot generalization capability in segmenting objects of unseen classes and in unseen domains when properly prompted. Interactivity is a key strength of SAMs, allowing users to iteratively provide prompts that specify objects of interest to refine outputs. However, to realize the interactive use of SAMs for 3D medical imaging tasks, rapid inference times are necessary. High memory requirements and long processing delays remain constraints that hinder the adoption of SAMs for this purpose. Specifically, while 2D SAMs applied to 3D volumes contend with repetitive computation to process all slices independently, 3D SAMs suffer from an exponential increase in model parameters and FLOPS. To address these challenges, we present FastSAM3D which accelerates SAM inference to 8 milliseconds per 128*128*128 3D volumetric image on an NVIDIA A100 GPU. This speedup is accomplished through 1) a novel layer-wise progressive distillation scheme that enables knowledge transfer from a complex 12-layer ViT-B to a lightweight 6-layer ViT-Tiny variant encoder without training from scratch; and 2) a novel 3D sparse flash attention to replace vanilla attention operators, substantially reducing memory needs and improving parallelization. Experiments on three diverse datasets reveal that FastSAM3D achieves a remarkable speedup of 527.38x compared to 2D SAMs and 8.75x compared to 3D SAMs on the same volumes without significant performance decline. Thus, FastSAM3D opens the door for low-cost truly interactive SAM-based 3D medical imaging segmentation with commonly used GPU hardware. Code is available at https://github.com/arcadelab/FastSAM3D.

Related papers

EdgeTAM: On-Device Track Anything Model [65.10032957471824]
Segment Anything Model (SAM) 2 further extends its capability from image to video inputs through a memory bank mechanism. We aim at making SAM 2 much more efficient so that it even runs on mobile devices while maintaining a comparable performance. We propose EdgeTAM, which leverages a novel 2D Spatial Perceiver to reduce the computational cost.
arXiv Detail & Related papers (2025-01-13T12:11:07Z)
Memorizing SAM: 3D Medical Segment Anything Model with Memorizing Transformer [8.973249762345793]
We propose Memorizing SAM, a novel 3D SAM architecture incorporating a memory Transformer as a plug-in. Unlike conventional memorizing Transformers that save the internal representation during training or inference, our Memorizing SAM utilizes existing highly accurate internal representation as the memory source. We evaluate the performance of Memorizing SAM in 33 categories from the TotalSegmentator dataset, which indicates that Memorizing SAM can outperform state-of-the-art 3D SAM variant i.e., FastSAM3D with an average Dice increase of 11.36% at the cost of only 4.38 millisecond increase in inference time.
arXiv Detail & Related papers (2024-12-18T14:51:25Z)
Lightweight Method for Interactive 3D Medical Image Segmentation with Multi-Round Result Fusion [7.158573385931718]
Segment Anything Model (SAM) has drawn widespread attention due to its zero-shot generalization capabilities in interactive segmentation. We propose Lightweight Interactive Network for 3D Medical Image (LIM-Net) as a novel approach demonstrating the potential of compact CNN-based models. LIM-Net initiates segmentation by generating a 2D prompt mask from user hints. It exhibits stronger generalization to unseen data compared to SAM-based models, with competitive accuracy while requiring fewer interactions.
arXiv Detail & Related papers (2024-12-11T11:52:16Z)
SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners [87.76470518069338]
We introduce SAM2Point, a preliminary exploration adapting Segment Anything Model 2 (SAM 2) for promptable 3D segmentation. Our framework supports various prompt types, including 3D points, boxes, and masks, and can generalize across diverse scenarios, such as 3D objects, indoor scenes, sparse outdoor environments, and raw LiDAR. To our best knowledge, we present the most faithful implementation of SAM in 3D, which may serve as a starting point for future research in promptable 3D segmentation.
arXiv Detail & Related papers (2024-08-29T17:59:45Z)
EmbodiedSAM: Online Segment Any 3D Thing in Real Time [61.2321497708998]
Embodied tasks require the agent to fully understand 3D scenes simultaneously with its exploration. An online, real-time, fine-grained and highly-generalized 3D perception model is desperately needed.
arXiv Detail & Related papers (2024-08-21T17:57:06Z)
From SAM to SAM 2: Exploring Improvements in Meta's Segment Anything Model [0.5639904484784127]
The Segment Anything Model (SAM) was introduced to the computer vision community by Meta in April 2023. SAM excels in zero-shot performance, segmenting unseen objects without additional training, stimulated by a large dataset of over one billion image masks. SAM 2 expands this functionality to video, leveraging memory from preceding and subsequent frames to generate accurate segmentation across entire videos.
arXiv Detail & Related papers (2024-08-12T17:17:35Z)
Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection [58.241593208031816]
Segment Anything Model (SAM) has been proposed as a visual fundamental model, which gives strong segmentation and generalization capabilities. We propose a Multi-scale and Detail-enhanced SAM (MDSAM) for Salient Object Detection (SOD) Experimental results demonstrate the superior performance of our model on multiple SOD datasets.
arXiv Detail & Related papers (2024-08-08T09:09:37Z)
Interactive 3D Medical Image Segmentation with SAM 2 [17.523874868612577]
We explore the zero-shot capabilities of SAM 2, the next-generation Meta SAM model trained on videos, for 3D medical image segmentation. By treating sequential 2D slices of 3D images as video frames, SAM 2 can fully automatically propagate annotations from a single frame to the entire 3D volume.
arXiv Detail & Related papers (2024-08-05T16:58:56Z)
Towards a Comprehensive, Efficient and Promptable Anatomic Structure Segmentation Model using 3D Whole-body CT Scans [23.573958232965104]
Segment anything model (SAM) demonstrates strong ability generalization on natural image segmentation. For segmenting 3D radiological CT or MRI scans, a 2D SAM model has to separately handle hundreds of 2D slices. We propose a comprehensive and scalable 3D SAM model for whole-body CT segmentation, named CT-SAM3D.
arXiv Detail & Related papers (2024-03-22T09:40:52Z)
SAM-Lightening: A Lightweight Segment Anything Model with Dilated Flash Attention to Achieve 30 times Acceleration [6.515075311704396]
Segment Anything Model (SAM) has garnered significant attention in segmentation tasks due to their zero-shot generalization ability. We introduce SAM-Lightening, a variant of SAM, that features a re-engineered attention mechanism, termed Dilated Flash Attention. Experiments on COCO and LVIS reveal that SAM-Lightening significantly outperforms the state-of-the-art methods in both run-time efficiency and segmentation accuracy.
arXiv Detail & Related papers (2024-03-14T09:07:34Z)
TinySAM: Pushing the Envelope for Efficient Segment Anything Model [76.21007576954035]
We propose a framework to obtain a tiny segment anything model (TinySAM) while maintaining the strong zero-shot performance. We first propose a full-stage knowledge distillation method with hard prompt sampling and hard mask weighting strategy to distill a lightweight student model. We also adapt the post-training quantization to the promptable segmentation task and further reduce the computational cost.
arXiv Detail & Related papers (2023-12-21T12:26:11Z)
RepViT-SAM: Towards Real-Time Segmenting Anything [71.94042743317937]
Segment Anything Model (SAM) has shown impressive zero-shot transfer performance for various computer vision tasks. MobileSAM proposes to replace the heavyweight image encoder in SAM with TinyViT by employing distillation. RepViT-SAM can enjoy significantly better zero-shot transfer capability than MobileSAM, along with nearly $10times$ faster inference speed.
arXiv Detail & Related papers (2023-12-10T04:42:56Z)
SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Instance Segmentation [24.733049281032272]
We introduce SAMPro3D for zero-shot instance segmentation of 3D scenes. Our approach segments 3D instances by applying the pretrained Segment Anything Model (SAM) to 2D frames. Our method achieves comparable or better performance compared to previous zero-shot or fully supervised approaches.
arXiv Detail & Related papers (2023-11-29T15:11:03Z)
TomoSAM: a 3D Slicer extension using SAM for tomography segmentation [62.997667081978825]
TomoSAM has been developed to integrate the cutting-edge Segment Anything Model (SAM) into 3D Slicer. SAM is a promptable deep learning model that is able to identify objects and create image masks in a zero-shot manner. The synergy between these tools aids in the segmentation of complex 3D datasets from tomography or other imaging techniques.
arXiv Detail & Related papers (2023-06-14T16:13:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.