Think as Cardiac Sonographers: Marrying SAM with Left Ventricular Indicators Measurements According to Clinical Guidelines
- URL: http://arxiv.org/abs/2508.08566v1
- Date: Tue, 12 Aug 2025 02:09:36 GMT
- Title: Think as Cardiac Sonographers: Marrying SAM with Left Ventricular Indicators Measurements According to Clinical Guidelines
- Authors: Tuo Liu, Qinghan Yang, Yu Zhang, Rongjun Ge, Yang Chen, Guangquan Zhou,
- Abstract summary: Left ventricular (LV) indicator measurements following clinical echocardiog-raphy guidelines are important for diagnosing cardiovascular disease.<n>It is necessary to introduce vision founda-tional models (VFM) with abundant knowledge.<n>We propose a novel framework named AutoSAME, combining the powerful visual understanding of SAM with seg-mentation and landmark localization tasks simultaneously.
- Score: 10.334018181732022
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Left ventricular (LV) indicator measurements following clinical echocardiog-raphy guidelines are important for diagnosing cardiovascular disease. Alt-hough existing algorithms have explored automated LV quantification, they can struggle to capture generic visual representations due to the normally small training datasets. Therefore, it is necessary to introduce vision founda-tional models (VFM) with abundant knowledge. However, VFMs represented by the segment anything model (SAM) are usually suitable for segmentation but incapable of identifying key anatomical points, which are critical in LV indicator measurements. In this paper, we propose a novel framework named AutoSAME, combining the powerful visual understanding of SAM with seg-mentation and landmark localization tasks simultaneously. Consequently, the framework mimics the operation of cardiac sonographers, achieving LV indi-cator measurements consistent with clinical guidelines. We further present fil-tered cross-branch attention (FCBA) in AutoSAME, which leverages relatively comprehensive features in the segmentation to enhance the heatmap regression (HR) of key points from the frequency domain perspective, optimizing the vis-ual representation learned by the latter. Moreover, we propose spatial-guided prompt alignment (SGPA) to automatically generate prompt embeddings guid-ed by spatial properties of LV, thereby improving the accuracy of dense pre-dictions by prior spatial knowledge. The extensive experiments on an echocar-diography dataset demonstrate the efficiency of each design and the superiori-ty of our AutoSAME in LV segmentation, landmark localization, and indicator measurements. The code will be available at https://github.com/QC-LIU-1997/AutoSAME.
Related papers
- RAU: Reference-based Anatomical Understanding with Vision Language Models [26.06602931463068]
We introduce RAU, a framework for reference-based anatomical understanding with vision-language models (VLMs)<n>We first show that a VLM learns to identify anatomical regions through relative spatial reasoning between reference and target images.<n>Next, we demonstrate that the VLM-derived spatial cues can be seamlessly integrated with the fine-grained segmentation capability of SAM2.
arXiv Detail & Related papers (2025-09-26T14:32:03Z) - SAMIR, an efficient registration framework via robust feature learning from SAM [40.09295562721889]
This paper introduces SAMIR, an efficient medical image registration framework.<n>SAM is pretrained on large-scale natural image datasets and can learn robust, general-purpose visual representations.<n>We show that SAMIR significantly outperforms state-of-the-art methods on benchmark datasets for both intra-subject cardiac image registration and inter-subject abdomen CT image registration.
arXiv Detail & Related papers (2025-09-17T01:56:35Z) - EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance [79.66329903007869]
We present EchoWorld, a motion-aware world modeling framework for probe guidance.<n>It encodes anatomical knowledge and motion-induced visual dynamics.<n>It is trained on more than one million ultrasound images from over 200 routine scans.
arXiv Detail & Related papers (2025-04-17T16:19:05Z) - KaLDeX: Kalman Filter based Linear Deformable Cross Attention for Retina Vessel Segmentation [46.57880203321858]
We propose a novel network (KaLDeX) for vascular segmentation leveraging a Kalman filter based linear deformable cross attention (LDCA) module.
Our approach is based on two key components: Kalman filter (KF) based linear deformable convolution (LD) and cross-attention (CA) modules.
The proposed method is evaluated on retinal fundus image datasets (DRIVE, CHASE_BD1, and STARE) as well as the 3mm and 6mm of the OCTA-500 dataset.
arXiv Detail & Related papers (2024-10-28T16:00:42Z) - Continuous max-flow augmentation of self-supervised few-shot learning on SPECT left ventricles [0.0]
This paper aims to give a recipe for diagnostic centers as well as for clinics to automatically segment the myocardium based on small and low-quality labels on reconstructed SPECT.
A combination of Continuous Max-Flow (CMF) with prior shape information is developed to augment the 3D U-Net self-supervised learning (SSL) approach on various geometries of SPECT apparatus.
arXiv Detail & Related papers (2024-05-09T03:19:19Z) - Semantic-aware Temporal Channel-wise Attention for Cardiac Function
Assessment [69.02116920364311]
Existing video-based methods do not pay much attention to the left ventricular region, nor the left ventricular changes caused by motion.
We propose a semi-supervised auxiliary learning paradigm with a left ventricular segmentation task, which contributes to the representation learning for the left ventricular region.
Our approach achieves state-of-the-art performance on the Stanford dataset with an improvement of 0.22 MAE, 0.26 RMSE, and 1.9% $R2$.
arXiv Detail & Related papers (2023-10-09T05:57:01Z) - SimLVSeg: Simplifying Left Ventricular Segmentation in 2D+Time Echocardiograms with Self- and Weakly-Supervised Learning [0.8672882547905405]
We develop SimLVSeg, a video-based network for consistent left ventricular (LV) segmentation from sparsely annotated echocardiogram videos.
SimLVSeg consists of self-supervised pre-training with temporal masking, followed by weakly supervised learning tailored for LV segmentation from sparse annotations.
We demonstrate how SimLVSeg outperforms the state-of-the-art solutions by achieving a 93.32% dice score on the largest 2D+time echocardiography dataset.
arXiv Detail & Related papers (2023-09-30T18:13:41Z) - Light-weight spatio-temporal graphs for segmentation and ejection
fraction prediction in cardiac ultrasound [5.597394612661975]
We propose an automated method called EchoGraphs for predicting ejection fraction and segmenting the left ventricle.
Models for direct coordinate regression based on Graph Conal Networks (GCNs) are used to detect the keypoints.
Compared to semantic segmentation, GCNs show accurate segmentation and improvements in robustness and inference runtime.
arXiv Detail & Related papers (2022-07-06T10:03:44Z) - Real-time landmark detection for precise endoscopic submucosal
dissection via shape-aware relation network [51.44506007844284]
We propose a shape-aware relation network for accurate and real-time landmark detection in endoscopic submucosal dissection surgery.
We first devise an algorithm to automatically generate relation keypoint heatmaps, which intuitively represent the prior knowledge of spatial relations among landmarks.
We then develop two complementary regularization schemes to progressively incorporate the prior knowledge into the training process.
arXiv Detail & Related papers (2021-11-08T07:57:30Z) - Training Automatic View Planner for Cardiac MR Imaging via
Self-Supervision by Spatial Relationship between Views [28.27778627797572]
This work presents a clinic-compatible and annotation-free system for automatic cardiac magnetic resonance imaging view planning.
The system mines the spatial relationship -- more specifically, locates and exploits the intersecting lines -- between the source and target views, and trains deep networks to regress heatmaps defined by these intersecting lines.
A multi-view planning strategy is proposed to aggregate information from the predicted heatmaps for all the source views of a target view, for a globally optimal prescription.
arXiv Detail & Related papers (2021-09-24T02:25:22Z) - Multi-Task Neural Networks with Spatial Activation for Retinal Vessel
Segmentation and Artery/Vein Classification [49.64863177155927]
We propose a multi-task deep neural network with spatial activation mechanism to segment full retinal vessel, artery and vein simultaneously.
The proposed network achieves pixel-wise accuracy of 95.70% for vessel segmentation, and A/V classification accuracy of 94.50%, which is the state-of-the-art performance for both tasks.
arXiv Detail & Related papers (2020-07-18T05:46:47Z) - A Global Benchmark of Algorithms for Segmenting Late Gadolinium-Enhanced
Cardiac Magnetic Resonance Imaging [90.29017019187282]
" 2018 Left Atrium Challenge" using 154 3D LGE-MRIs, currently the world's largest cardiac LGE-MRI dataset.
Analyse of the submitted algorithms using technical and biological metrics was performed.
Results show the top method achieved a dice score of 93.2% and a mean surface to a surface distance of 0.7 mm.
arXiv Detail & Related papers (2020-04-26T08:49:17Z) - Multi-Lead ECG Classification via an Information-Based Attention
Convolutional Neural Network [1.1720399305661802]
One-dimensional convolutional neural networks (CNN) have proven to be effective in pervasive classification tasks.
We implement the Residual connection and design a structure which can learn the weights from the information contained in different channels in the input feature map.
An indicator named mean square deviation is introduced to monitor the performance of a particular model segment in the classification task.
arXiv Detail & Related papers (2020-03-25T02:28:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.