Point Tracking as a Temporal Cue for Robust Myocardial Segmentation in Echocardiography Videos
- URL: http://arxiv.org/abs/2601.09207v1
- Date: Wed, 14 Jan 2026 06:23:36 GMT
- Title: Point Tracking as a Temporal Cue for Robust Myocardial Segmentation in Echocardiography Videos
- Authors: Bahar Khodabakhshian, Nima Hashemi, Armin Saadat, Zahra Gholami, In-Chang Hwang, Samira Sojoudi, Christina Luong, Purang Abolmaesumi, Teresa Tsang,
- Abstract summary: Myocardium segmentation in echocardiography videos is a challenging task due to low contrast, noise, and anatomical variability.<n>Traditional deep learning models either process frames independently, ignoring temporal information, or rely on memory-based feature propagation.<n>We propose Point-Seg, a transformer-based segmentation framework that integrates point tracking as a temporal cue.
- Score: 2.7509305461575875
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Purpose: Myocardium segmentation in echocardiography videos is a challenging task due to low contrast, noise, and anatomical variability. Traditional deep learning models either process frames independently, ignoring temporal information, or rely on memory-based feature propagation, which accumulates error over time. Methods: We propose Point-Seg, a transformer-based segmentation framework that integrates point tracking as a temporal cue to ensure stable and consistent segmentation of myocardium across frames. Our method leverages a point-tracking module trained on a synthetic echocardiography dataset to track key anatomical landmarks across video sequences. These tracked trajectories provide an explicit motion-aware signal that guides segmentation, reducing drift and eliminating the need for memory-based feature accumulation. Additionally, we incorporate a temporal smoothing loss to further enhance temporal consistency across frames. Results: We evaluate our approach on both public and private echocardiography datasets. Experimental results demonstrate that Point-Seg has statistically similar accuracy in terms of Dice to state-of-the-art segmentation models in high quality echo data, while it achieves better segmentation accuracy in lower quality echo with improved temporal stability. Furthermore, Point-Seg has the key advantage of pixel-level myocardium motion information as opposed to other segmentation methods. Such information is essential in the computation of other downstream tasks such as myocardial strain measurement and regional wall motion abnormality detection. Conclusion: Point-Seg demonstrates that point tracking can serve as an effective temporal cue for consistent video segmentation, offering a reliable and generalizable approach for myocardium segmentation in echocardiography videos. The code is available at https://github.com/DeepRCL/PointSeg.
Related papers
- GDKVM: Echocardiography Video Segmentation via Spatiotemporal Key-Value Memory with Gated Delta Rule [28.526034344479935]
We introduce GDKVM, a novel architecture for echocardiography video segmentation.<n>The model employs Linear Key-Value Association (LKVA) to effectively model inter-frame correlations, and introduces Gated Delta Rule (GDR) to efficiently store intermediate memory states.<n>We validated GDKVM on two mainstream echocardiography video datasets (CAMUS and EchoNet-Dynamic) and compared it with various state-of-the-art methods. Experimental results show that GDKVM outperforms existing approaches in terms of segmentation accuracy and robustness, while ensuring real-time performance.
arXiv Detail & Related papers (2025-12-11T03:19:50Z) - A DyL-Unet framework based on dynamic learning for Temporally Consistent Echocardiographic Segmentation [0.328418927821443]
We propose DyL-UNet, a dynamic learning-based temporal consistency U-Net segmentation architecture.<n>The framework constructs an Echo-Dynamics Graph (EDG) through dynamic learning to extract dynamic information from videos.<n>Experiments on the CAMUS and EchoNet-Dynamic datasets demonstrate that DyL-UNet maintains segmentation accuracy comparable to existing methods.
arXiv Detail & Related papers (2025-09-23T14:17:01Z) - Hierarchical Spatio-temporal Segmentation Network for Ejection Fraction Estimation in Echocardiography Videos [20.353975738483417]
Our model aims to improve Ejection Fraction estimation accuracy by synerizing local detail modeling.<n>The hierarchical design avoids issues such as local accumulation when relying solely on single frames or details when using multi-frame data.
arXiv Detail & Related papers (2025-08-26T05:04:49Z) - MATE: Motion-Augmented Temporal Consistency for Event-based Point Tracking [58.719310295870024]
This paper presents an event-based framework for tracking any point.<n>To resolve ambiguities caused by event sparsity, a motion-guidance module incorporates kinematic vectors into the local matching process.<n>The method improves the $Survival_50$ metric by 17.9% over event-only tracking of any point baseline.
arXiv Detail & Related papers (2024-12-02T09:13:29Z) - Bidirectional Recurrence for Cardiac Motion Tracking with Gaussian Process Latent Coding [9.263168872795843]
GPTrack is a novel unsupervised framework crafted to explore the temporal and spatial dynamics of cardiac motion.
It enhances motion tracking by employing the sequential Gaussian Process in the latent space and encoding statistics by spatial information at each time stamp.
Our GPTrack significantly improves the precision of motion tracking in both 3D and 4D medical images while maintaining computational efficiency.
arXiv Detail & Related papers (2024-10-28T05:33:48Z) - Semantic-aware Temporal Channel-wise Attention for Cardiac Function
Assessment [69.02116920364311]
Existing video-based methods do not pay much attention to the left ventricular region, nor the left ventricular changes caused by motion.
We propose a semi-supervised auxiliary learning paradigm with a left ventricular segmentation task, which contributes to the representation learning for the left ventricular region.
Our approach achieves state-of-the-art performance on the Stanford dataset with an improvement of 0.22 MAE, 0.26 RMSE, and 1.9% $R2$.
arXiv Detail & Related papers (2023-10-09T05:57:01Z) - A Spatial-Temporal Deformable Attention based Framework for Breast
Lesion Detection in Videos [107.96514633713034]
We propose a spatial-temporal deformable attention based framework, named STNet.
Our STNet introduces a spatial-temporal deformable attention module to perform local spatial-temporal feature fusion.
Experiments on the public breast lesion ultrasound video dataset show that our STNet obtains a state-of-the-art detection performance.
arXiv Detail & Related papers (2023-09-09T07:00:10Z) - Implicit Motion Handling for Video Camouflaged Object Detection [60.98467179649398]
We propose a new video camouflaged object detection (VCOD) framework.
It can exploit both short-term and long-term temporal consistency to detect camouflaged objects from video frames.
arXiv Detail & Related papers (2022-03-14T17:55:41Z) - Echocardiography Segmentation with Enforced Temporal Consistency [10.652677452009627]
We propose a framework to learn the 2D+time long-axis cardiac shape.
The identification and correction of cardiac inconsistencies relies on a constrained autoencoder trained to learn a physiologically interpretable embedding of cardiac shapes.
arXiv Detail & Related papers (2021-12-03T16:09:32Z) - Residual Moment Loss for Medical Image Segmentation [56.72261489147506]
Location information is proven to benefit the deep learning models on capturing the manifold structure of target objects.
Most existing methods encode the location information in an implicit way, for the network to learn.
We propose a novel loss function, namely residual moment (RM) loss, to explicitly embed the location information of segmentation targets.
arXiv Detail & Related papers (2021-06-27T09:31:49Z) - Weakly-supervised Learning For Catheter Segmentation in 3D Frustum
Ultrasound [74.22397862400177]
We propose a novel Frustum ultrasound based catheter segmentation method.
The proposed method achieved the state-of-the-art performance with an efficiency of 0.25 second per volume.
arXiv Detail & Related papers (2020-10-19T13:56:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.