When Tracking Fails: Analyzing Failure Modes of SAM2 for Point-Based Tracking in Surgical Videos
- URL: http://arxiv.org/abs/2510.02100v1
- Date: Thu, 02 Oct 2025 15:06:49 GMT
- Title: When Tracking Fails: Analyzing Failure Modes of SAM2 for Point-Based Tracking in Surgical Videos
- Authors: Woowon Jang, Jiwon Im, Juseung Choi, Niki Rashidian, Wesley De Neve, Utku Ozbulak,
- Abstract summary: We systematically analyze the failure modes of point-based tracking in laparoscopic cholecystectomy videos.<n>Our results show that point-based tracking is competitive for surgical tools but consistently underperforms for anatomical targets.
- Score: 0.43981305860983705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video object segmentation (VOS) models such as SAM2 offer promising zero-shot tracking capabilities for surgical videos using minimal user input. Among the available input types, point-based tracking offers an efficient and low-cost alternative, yet its reliability and failure cases in complex surgical environments are not well understood. In this work, we systematically analyze the failure modes of point-based tracking in laparoscopic cholecystectomy videos. Focusing on three surgical targets, the gallbladder, grasper, and L-hook electrocautery, we compare the performance of point-based tracking with segmentation mask initialization. Our results show that point-based tracking is competitive for surgical tools but consistently underperforms for anatomical targets, where tissue similarity and ambiguous boundaries lead to failure. Through qualitative analysis, we reveal key factors influencing tracking outcomes and provide several actionable recommendations for selecting and placing tracking points to improve performance in surgical video analysis.
Related papers
- HiLWS: A Human-in-the-Loop Weak Supervision Framework for Curating Clinical and Home Video Data for Neurological Assessment [3.920493604448087]
We present HiLWS, a cascaded human-in-the-loop weak supervision framework for curating and annotating hand motor task videos.<n>HiLWS employs a novel cascaded approach, first applies weak supervision to aggregate expert-provided annotations into probabilistic labels.<n>The complete pipeline includes quality filtering, optimized pose estimation, and task-specific segment extraction.
arXiv Detail & Related papers (2025-09-09T22:30:25Z) - ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking [15.83425997240828]
ReSurgSAM2 is a two-stage referring surgical segmentation framework.<n>It uses cross-modal spatial-temporal Mamba to generate precise detection and segmentation results.<n>It incorporates a diversity-driven memory mechanism that maintains a credible and diverse memory bank, ensuring consistent long-term tracking.
arXiv Detail & Related papers (2025-05-13T13:56:10Z) - One Patient's Annotation is Another One's Initialization: Towards Zero-Shot Surgical Video Segmentation with Cross-Patient Initialization [1.0536099636804035]
Video object segmentation is an emerging technology that is well-suited for real-time surgical video segmentation.<n>However, its adoption is limited by the need for manual intervention to select the tracked object.<n>In this work, we tackle this challenge with an innovative solution: using previously annotated frames from other patients as the tracking frames.<n>We find that this unconventional approach can match or even surpass the performance of using patients' own tracking frames.
arXiv Detail & Related papers (2025-03-04T03:11:03Z) - Revisiting the Evaluation Bias Introduced by Frame Sampling Strategies in Surgical Video Segmentation Using SAM2 [1.0536099636804035]
We investigate how inconsistencies in annotation density and frame rate sampling influence the evaluation of zero-shot segmentation models.<n>We find that lower frame rates can appear to outperform higher ones due to a smoothing effect that conceals temporal inconsistencies.<n>When assessed under real-time streaming conditions, higher frame rates yield superior segmentation stability.
arXiv Detail & Related papers (2025-02-28T10:42:09Z) - Tracking Everything in Robotic-Assisted Surgery [39.62251870446397]
We present an annotated surgical tracking dataset for benchmarking tracking methods for surgical scenarios.<n>We evaluate state-of-the-art (SOTA) TAP-based algorithms on this dataset and reveal their limitations in challenging surgical scenarios.<n>We propose a new tracking method, namely SurgMotion, to solve the challenges and further improve the tracking performance.
arXiv Detail & Related papers (2024-09-29T23:06:57Z) - Real-time guidewire tracking and segmentation in intraoperative x-ray [52.51797358201872]
We propose a two-stage deep learning framework for real-time guidewire segmentation and tracking.
In the first stage, a Yolov5 detector is trained, using the original X-ray images as well as synthetic ones, to output the bounding boxes of possible target guidewires.
In the second stage, a novel and efficient network is proposed to segment the guidewire in each detected bounding box.
arXiv Detail & Related papers (2024-04-12T20:39:19Z) - RTracker: Recoverable Tracking via PN Tree Structured Memory [71.05904715104411]
We propose a recoverable tracking framework, RTracker, that uses a tree-structured memory to dynamically associate a tracker and a detector to enable self-recovery.
Specifically, we propose a Positive-Negative Tree-structured memory to chronologically store and maintain positive and negative target samples.
Our core idea is to use the support samples of positive and negative target categories to establish a relative distance-based criterion for a reliable assessment of target loss.
arXiv Detail & Related papers (2024-03-28T08:54:40Z) - Segment Anything Meets Point Tracking [116.44931239508578]
This paper presents a novel method for point-centric interactive video segmentation, empowered by SAM and long-term point tracking.
We highlight the merits of point-based tracking through direct evaluation on the zero-shot open-world Unidentified Video Objects (UVO) benchmark.
Our experiments on popular video object segmentation and multi-object segmentation tracking benchmarks, including DAVIS, YouTube-VOS, and BDD100K, suggest that a point-based segmentation tracker yields better zero-shot performance and efficient interactions.
arXiv Detail & Related papers (2023-07-03T17:58:01Z) - Next-generation Surgical Navigation: Marker-less Multi-view 6DoF Pose Estimation of Surgical Instruments [64.59698930334012]
We present a multi-camera capture setup consisting of static and head-mounted cameras.<n>Second, we publish a multi-view RGB-D video dataset of ex-vivo spine surgeries, captured in a surgical wet lab and a real operating theatre.<n>Third, we evaluate three state-of-the-art single-view and multi-view methods for the task of 6DoF pose estimation of surgical instruments.
arXiv Detail & Related papers (2023-05-05T13:42:19Z) - CholecTriplet2021: A benchmark challenge for surgical action triplet
recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos.
We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge.
A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z) - One-shot action recognition towards novel assistive therapies [63.23654147345168]
This work is motivated by the automated analysis of medical therapies that involve action imitation games.
The presented approach incorporates a pre-processing step that standardizes heterogeneous motion data conditions.
We evaluate the approach on a real use-case of automated video analysis for therapy support with autistic people.
arXiv Detail & Related papers (2021-02-17T19:41:37Z) - Robust Medical Instrument Segmentation Challenge 2019 [56.148440125599905]
Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions.
Our challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures.
The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap.
arXiv Detail & Related papers (2020-03-23T14:35:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.