SpinalSAM-R1: A Vision-Language Multimodal Interactive System for Spine CT Segmentation
- URL: http://arxiv.org/abs/2511.00095v1
- Date: Thu, 30 Oct 2025 10:14:42 GMT
- Title: SpinalSAM-R1: A Vision-Language Multimodal Interactive System for Spine CT Segmentation
- Authors: Jiaming Liu, Dingwei Fan, Junyong Zhao, Chunlin Li, Haipeng Si, Liang Sun,
- Abstract summary: We propose SpinalSAM-R1, a vision-language interactive system that integrates a fine-tuned SAM with DeepSeek-R1 for spine CT image segmentation.<n> Specifically, our SpinalSAM-R1 introduces an anatomy-guided attention mechanism to improve spine segmentation performance.<n>The system supports 11 clinical operations with 94.3% parsing accuracy and sub-800 ms response times.
- Score: 14.699926241003395
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The anatomical structure segmentation of the spine and adjacent structures from computed tomography (CT) images is a key step for spinal disease diagnosis and treatment. However, the segmentation of CT images is impeded by low contrast and complex vertebral boundaries. Although advanced models such as the Segment Anything Model (SAM) have shown promise in various segmentation tasks, their performance in spinal CT imaging is limited by high annotation requirements and poor domain adaptability. To address these limitations, we propose SpinalSAM-R1, a multimodal vision-language interactive system that integrates a fine-tuned SAM with DeepSeek-R1, for spine CT image segmentation. Specifically, our SpinalSAM-R1 introduces an anatomy-guided attention mechanism to improve spine segmentation performance, and a semantics-driven interaction protocol powered by DeepSeek-R1, enabling natural language-guided refinement. The SpinalSAM-R1 is fine-tuned using Low-Rank Adaptation (LoRA) for efficient adaptation. We validate our SpinalSAM-R1 on the spine anatomical structure with CT images. Experimental results suggest that our method achieves superior segmentation performance. Meanwhile, we develop a PyQt5-based interactive software, which supports point, box, and text-based prompts. The system supports 11 clinical operations with 94.3\% parsing accuracy and sub-800 ms response times. The software is released on https://github.com/6jm233333/spinalsam-r1.
Related papers
- SA$^{2}$Net: Scale-Adaptive Structure-Affinity Transformation for Spine Segmentation from Ultrasound Volume Projection Imaging [21.660042213751794]
We propose a novel structure-aware network (SA$2$Net) for effective spine segmentation.<n>First, we propose a scale-adaptive complementary strategy to learn the cross-dimensional long-distance correlation features for spinal images.<n>Second, we transform semantic features with class-specific affinity and combine it with a Transformer decoder for structure-aware reasoning.
arXiv Detail & Related papers (2025-10-30T14:58:16Z) - TotalSegmentator MRI: Robust Sequence-independent Segmentation of Multiple Anatomic Structures in MRI [59.86827659781022]
A nnU-Net model (TotalSegmentator) was trained on MRI and segment 80atomic structures.<n>Dice scores were calculated between the predicted segmentations and expert reference standard segmentations to evaluate model performance.<n>Open-source, easy-to-use model allows for automatic, robust segmentation of 80 structures.
arXiv Detail & Related papers (2024-05-29T20:15:54Z) - SymTC: A Symbiotic Transformer-CNN Net for Instance Segmentation of Lumbar Spine MRI [1.4249943098958722]
Intervertebral disc disease, a prevalent ailment, frequently leads to intermittent or persistent low back pain.
Deep neural network (DNN) models may assist clinicians with more efficient image segmentation of individual instances.
We propose SymTC, an innovative lumbar spine MR image segmentation model that combines the strengths of Transformer and CNN.
arXiv Detail & Related papers (2024-01-17T22:34:20Z) - Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis.
We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z) - MA-SAM: Modality-agnostic SAM Adaptation for 3D Medical Image
Segmentation [58.53672866662472]
We introduce a modality-agnostic SAM adaptation framework, named as MA-SAM.
Our method roots in the parameter-efficient fine-tuning strategy to update only a small portion of weight increments.
By injecting a series of 3D adapters into the transformer blocks of the image encoder, our method enables the pre-trained 2D backbone to extract third-dimensional information from input data.
arXiv Detail & Related papers (2023-09-16T02:41:53Z) - 3DSAM-adapter: Holistic adaptation of SAM from 2D to 3D for promptable tumor segmentation [52.699139151447945]
We propose a novel adaptation method for transferring the segment anything model (SAM) from 2D to 3D for promptable medical image segmentation.
Our model can outperform domain state-of-the-art medical image segmentation models on 3 out of 4 tasks, specifically by 8.25%, 29.87%, and 10.11% for kidney tumor, pancreas tumor, colon cancer segmentation, and achieve similar performance for liver tumor segmentation.
arXiv Detail & Related papers (2023-06-23T12:09:52Z) - Enabling Augmented Segmentation and Registration in Ultrasound-Guided
Spinal Surgery via Realistic Ultrasound Synthesis from Diagnostic CT Volume [19.177141722698188]
The scarcity of intra-operative clinical US data is an insurmountable bottleneck in training a neural network.
We propose an In silico bone US simulation framework that synthesizes realistic US images from diagnostic CT volume.
We train a lightweight vision transformer model that can achieve accurate and on-the-fly bone segmentation for spinal sonography.
arXiv Detail & Related papers (2023-01-05T07:28:06Z) - Reliable Joint Segmentation of Retinal Edema Lesions in OCT Images [55.83984261827332]
In this paper, we propose a novel reliable multi-scale wavelet-enhanced transformer network.
We develop a novel segmentation backbone that integrates a wavelet-enhanced feature extractor network and a multi-scale transformer module.
Our proposed method achieves better segmentation accuracy with a high degree of reliability as compared to other state-of-the-art segmentation approaches.
arXiv Detail & Related papers (2022-12-01T07:32:56Z) - Context-Aware Transformers For Spinal Cancer Detection and Radiological
Grading [70.04389979779195]
This paper proposes a novel transformer-based model architecture for medical imaging problems involving analysis of vertebrae.
It considers two applications of such models in MR images: (a) detection of spinal metastases and the related conditions of vertebral fractures and metastatic cord compression.
We show that by considering the context of vertebral bodies in the image, SCT improves the accuracy for several gradings compared to previously published model.
arXiv Detail & Related papers (2022-06-27T10:31:03Z) - A Novel Mask R-CNN Model to Segment Heterogeneous Brain Tumors through
Image Subtraction [0.0]
We propose using a method performed by radiologists called image segmentation and applying it to machine learning models to prove a better segmentation.
Using Mask R-CNN, its ResNet backbone being pre-trained on the RSNA pneumonia detection challenge dataset, we can train a model on the Brats 2020 Brain Tumor dataset.
We can see how well the method of image subtraction works by comparing it to models without image subtraction through DICE coefficient (F1 score), recall, and precision on the untouched test set.
arXiv Detail & Related papers (2022-04-04T01:45:11Z) - SpineOne: A One-Stage Detection Framework for Degenerative Discs and
Vertebrae [54.751251046196494]
We propose a one-stage detection framework termed SpineOne to simultaneously localize and classify degenerative discs and vertebrae from MRI slices.
SpineOne is built upon the following three key techniques: 1) a new design of the keypoint heatmap to facilitate simultaneous keypoint localization and classification; 2) the use of attention modules to better differentiate the representations between discs and vertebrae; and 3) a novel gradient-guided objective association mechanism to associate multiple learning objectives at the later training stage.
arXiv Detail & Related papers (2021-10-28T12:59:06Z) - CTSpine1K: A Large-Scale Dataset for Spinal Vertebrae Segmentation in Computed Tomography [27.27657839726696]
We introduce a large-scale spine CT dataset, called CTSpine1K, curated from multiple sources for vertebra segmentation.
This dataset contains 1,005 CT volumes with over 11,100 labeled vertebrae belonging to different spinal conditions.
arXiv Detail & Related papers (2021-05-31T05:34:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.