Related papers: SurgAtt-Tracker: Online Surgical Attention Tracking via Temporal Proposal Reranking and Motion-Aware Refinement

SurgAtt-Tracker: Online Surgical Attention Tracking via Temporal Proposal Reranking and Motion-Aware Refinement

URL: http://arxiv.org/abs/2602.20636v1
Date: Tue, 24 Feb 2026 07:30:51 GMT
Title: SurgAtt-Tracker: Online Surgical Attention Tracking via Temporal Proposal Reranking and Motion-Aware Refinement
Authors: Rulin Zhou, Guankun Wang, An Wang, Yujie Ma, Lixin Ouyang, Bolin Cui, Junyan Li, Chaowei Zhu, Mingyang Li, Ming Chen, Xiaopin Zhong, Peng Lu, Jiankun Wang, Xianming Liu, Hongliang Ren,
Abstract summary: SurgAtt-Tracker is a holistic framework that robustly tracks surgical attention.<n>Experiments on multiple surgical datasets demonstrate that SurgAtt-Tracker achieves consistently state-of-the-art performance.
Score: 45.37105164372227
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Accurate and stable field-of-view (FoV) guidance is critical for safe and efficient minimally invasive surgery, yet existing approaches often conflate visual attention estimation with downstream camera control or rely on direct object-centric assumptions. In this work, we formulate surgical attention tracking as a spatio-temporal learning problem and model surgeon focus as a dense attention heatmap, enabling continuous and interpretable frame-wise FoV guidance. We propose SurgAtt-Tracker, a holistic framework that robustly tracks surgical attention by exploiting temporal coherence through proposal-level reranking and motion-aware refinement, rather than direct regression. To support systematic training and evaluation, we introduce SurgAtt-1.16M, a large-scale benchmark with a clinically grounded annotation protocol that enables comprehensive heatmap-based attention analysis across procedures and institutions. Extensive experiments on multiple surgical datasets demonstrate that SurgAtt-Tracker consistently achieves state-of-the-art performance and strong robustness under occlusion, multi-instrument interference, and cross-domain settings. Beyond attention tracking, our approach provides a frame-wise FoV guidance signal that can directly support downstream robotic FoV planning and automatic camera control.

Related papers

Strategy-Supervised Autonomous Laparoscopic Camera Control via Event-Driven Graph Mining [15.995867664955348]
We present a strategy-grounded framework that couples high-level vision-language inference with low-level closed-loop control.<n> offline, raw surgical videos are parsed into camera-relevant temporal events and structured as attributed event graphs.<n>Online, a fine-tuned Vision-Language Model (VLM) processes the live laparoscopic view to predict the dominant strategy and discrete image-based motion commands.
arXiv Detail & Related papers (2026-02-24T02:56:39Z)
Detecting Object Tracking Failure via Sequential Hypothesis Testing [80.7891291021747]
Real-time online object tracking in videos constitutes a core task in computer vision.<n>We propose interpreting object tracking as a sequential hypothesis test, wherein evidence for or against tracking failures is gradually accumulated over time.<n>We propose both supervised and unsupervised variants by leveraging either ground-truth or solely internal tracking information.
arXiv Detail & Related papers (2026-02-13T14:57:15Z)
AR Surgical Navigation with Surface Tracing: Comparing In-Situ Visualization with Tool-Tracking Guidance for Neurosurgical Applications [0.0]
This study presents a novel methodology for utilizing AR guidance to register anatomical targets and provide real-time instrument navigation.<n>The system registers target positions to the patient through a novel surface tracing method and uses real-time infrared tool tracking to aid in catheter placement.
arXiv Detail & Related papers (2025-08-14T11:46:30Z)
Taming Modern Point Tracking for Speckle Tracking Echocardiography via Impartial Motion [0.686108371431346]
This work investigates the potential state-of-the-art point tracking methods for ultrasound, with a focus on echocardiography.<n>By analyzing cardiac motion throughout the heart cycle in real B-mode ultrasound videos, we identify that a directional motion bias is affecting the existing training strategies.<n>We incorporate a set of tailored augmentations to reduce the bias and enhance tracking generalization and robustness through impartial cardiac motion.
arXiv Detail & Related papers (2025-07-14T10:18:26Z)
EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance [79.66329903007869]
We present EchoWorld, a motion-aware world modeling framework for probe guidance.<n>It encodes anatomical knowledge and motion-induced visual dynamics.<n>It is trained on more than one million ultrasound images from over 200 routine scans.
arXiv Detail & Related papers (2025-04-17T16:19:05Z)
Open-World Drone Active Tracking with Goal-Centered Rewards [62.21394499788672]
Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations.<n>We propose DAT, the first open-world drone active air-to-ground tracking benchmark.<n>We also propose GC-VAT, which aims to improve the performance of drone tracking targets in complex scenarios.
arXiv Detail & Related papers (2024-12-01T09:37:46Z)
Tracking Everything in Robotic-Assisted Surgery [39.62251870446397]
We present an annotated surgical tracking dataset for benchmarking tracking methods for surgical scenarios.<n>We evaluate state-of-the-art (SOTA) TAP-based algorithms on this dataset and reveal their limitations in challenging surgical scenarios.<n>We propose a new tracking method, namely SurgMotion, to solve the challenges and further improve the tracking performance.
arXiv Detail & Related papers (2024-09-29T23:06:57Z)
Self-Supervised Learning for Interventional Image Analytics: Towards Robust Device Trackers [6.262161803642583]
We propose a novel approach to learn procedural features from a very large data cohort of over 16 million interventional X-ray frames. Our approach is based on a masked image modeling technique that leverages frame-based reconstruction to learn fine inter-frame temporal correspondences. Experiments show that our method achieves 66.31% reduction in maximum tracking error against reference solutions.
arXiv Detail & Related papers (2024-05-02T10:18:22Z)
Real-time guidewire tracking and segmentation in intraoperative x-ray [52.51797358201872]
We propose a two-stage deep learning framework for real-time guidewire segmentation and tracking. In the first stage, a Yolov5 detector is trained, using the original X-ray images as well as synthetic ones, to output the bounding boxes of possible target guidewires. In the second stage, a novel and efficient network is proposed to segment the guidewire in each detected bounding box.
arXiv Detail & Related papers (2024-04-12T20:39:19Z)
AiATrack: Attention in Attention for Transformer Visual Tracking [89.94386868729332]
Transformer trackers have achieved impressive advancements recently, where the attention mechanism plays an important role. We propose an attention in attention (AiA) module, which enhances appropriate correlations and suppresses erroneous ones by seeking consensus among all correlation vectors. Our AiA module can be readily applied to both self-attention blocks and cross-attention blocks to facilitate feature aggregation and information propagation for visual tracking.
arXiv Detail & Related papers (2022-07-20T00:44:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.