WeakSurg: Weakly supervised surgical instrument segmentation using temporal equivariance and semantic continuity
- URL: http://arxiv.org/abs/2403.09551v1
- Date: Thu, 14 Mar 2024 16:39:11 GMT
- Title: WeakSurg: Weakly supervised surgical instrument segmentation using temporal equivariance and semantic continuity
- Authors: Qiyuan Wang, Yanzhe Liu, Shang Zhao, Rong Liu, S. Kevin Zhou,
- Abstract summary: WeakSurg is first instrument-presence-only weakly supervised segmentation architecture to take temporal information into account for surgical scenarios.
Our results show that WeakSurg compares favorably with state-of-the-art methods not only on semantic segmentation metrics but also on instance segmentation metrics.
- Score: 14.448593791011204
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weakly supervised surgical instrument segmentation with only instrument presence labels has been rarely explored in surgical domain. To mitigate the highly under-constrained challenges, we extend a two-stage weakly supervised segmentation paradigm with temporal attributes from two perspectives. From a temporal equivariance perspective, we propose a prototype-based temporal equivariance regulation loss to enhance pixel-wise consistency between adjacent features. From a semantic continuity perspective, we propose a class-aware temporal semantic continuity loss to constrain the semantic consistency between a global view of target frame and local non-discriminative regions of adjacent reference frame. To the best of our knowledge, WeakSurg is the first instrument-presence-only weakly supervised segmentation architecture to take temporal information into account for surgical scenarios. Extensive experiments are validated on Cholec80, an open benchmark for phase and instrument recognition. We annotate instance-wise instrument labels with fixed time-steps which are double checked by a clinician with 3-years experience. Our results show that WeakSurg compares favorably with state-of-the-art methods not only on semantic segmentation metrics but also on instance segmentation metrics.
Related papers
- ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking [15.83425997240828]
ReSurgSAM2 is a two-stage referring surgical segmentation framework.<n>It uses cross-modal spatial-temporal Mamba to generate precise detection and segmentation results.<n>It incorporates a diversity-driven memory mechanism that maintains a credible and diverse memory bank, ensuring consistent long-term tracking.
arXiv Detail & Related papers (2025-05-13T13:56:10Z) - Multi-Stage Boundary-Aware Transformer Network for Action Segmentation in Untrimmed Surgical Videos [0.1053373860696675]
We present the Multi-Stage Boundary-Aware Transformer Network (MSBATN) with hierarchical sliding window attention.
Our proposed approach incorporates a novel unified loss function that treats action classification and boundary detection as distinct yet interdependent tasks.
Our boundary voting mechanism accurately identifies start and end points by leveraging contextual information.
arXiv Detail & Related papers (2025-04-26T01:07:56Z) - Temporal Propagation of Asymmetric Feature Pyramid for Surgical Scene Segmentation [7.150163844454341]
Surgical scene segmentation is crucial for robot-assisted laparoscopic surgery understanding.
Current approaches face two challenges: (i) static image limitations and fine-grained structural details.
We present temporal asymmetric feature propagation network, a bidirectional attention architecture enabling cross-frame feature propagation.
Our framework uniquely enables both temporal guidance and contextual reasoning for surgical scene understanding.
arXiv Detail & Related papers (2025-04-18T03:41:23Z) - Less is More? Revisiting the Importance of Frame Rate in Real-Time Zero-Shot Surgical Video Segmentation [1.0536099636804035]
We investigate the impact of frame rate on zero-shot surgical video segmentation, evaluating SAM2's effectiveness across multiple frame sampling rates for cholecystectomy procedures.
Surprisingly, our findings indicate that in conventional evaluation settings, frame rates as low as a single frame per second can outperform 25 FPS, as fewer frames smooth out segmentation inconsistencies.
In a real-time streaming scenario, higher frame rates yield superior temporal coherence and stability, particularly for dynamic objects such as surgical graspers.
arXiv Detail & Related papers (2025-02-28T10:42:09Z) - LACOSTE: Exploiting stereo and temporal contexts for surgical instrument segmentation [14.152207010509763]
We propose a novel LACOSTE model that exploits Location-Agnostic COntexts in Stereo and TEmporal images for improved surgical instrument segmentation.
We extensively validate our approach on three public surgical video datasets.
arXiv Detail & Related papers (2024-09-14T08:17:56Z) - Real-time guidewire tracking and segmentation in intraoperative x-ray [52.51797358201872]
We propose a two-stage deep learning framework for real-time guidewire segmentation and tracking.
In the first stage, a Yolov5 detector is trained, using the original X-ray images as well as synthetic ones, to output the bounding boxes of possible target guidewires.
In the second stage, a novel and efficient network is proposed to segment the guidewire in each detected bounding box.
arXiv Detail & Related papers (2024-04-12T20:39:19Z) - Weakly-Supervised Learning via Multi-Lateral Decoder Branching for Guidewire Segmentation in Robot-Assisted Cardiovascular Catheterization [4.894147633944561]
We propose a weakly-supervised learning method with multi-lateral pseudo labeling for tool segmentation in cardiac angiograms.
We trained the model end-to-end with weakly-annotated data obtained during robotic cardiac catheterization.
Compared to three existing weakly-supervised methods, our approach yielded higher segmentation performance across three different cardiac angiogram data.
arXiv Detail & Related papers (2024-04-11T09:23:44Z) - Visual-Kinematics Graph Learning for Procedure-agnostic Instrument Tip
Segmentation in Robotic Surgeries [29.201385352740555]
We propose a novel visual-kinematics graph learning framework to accurately segment the instrument tip given various surgical procedures.
Specifically, a graph learning framework is proposed to encode relational features of instrument parts from both image and kinematics.
A cross-modal contrastive loss is designed to incorporate robust geometric prior from kinematics to image for tip segmentation.
arXiv Detail & Related papers (2023-09-02T14:52:58Z) - GLSFormer : Gated - Long, Short Sequence Transformer for Step
Recognition in Surgical Videos [57.93194315839009]
We propose a vision transformer-based approach to learn temporal features directly from sequence-level patches.
We extensively evaluate our approach on two cataract surgery video datasets, Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods.
arXiv Detail & Related papers (2023-07-20T17:57:04Z) - LoViT: Long Video Transformer for Surgical Phase Recognition [59.06812739441785]
We present a two-stage method, called Long Video Transformer (LoViT) for fusing short- and long-term temporal information.
Our approach outperforms state-of-the-art methods on the Cholec80 and AutoLaparo datasets consistently.
arXiv Detail & Related papers (2023-05-15T20:06:14Z) - TraSeTR: Track-to-Segment Transformer with Contrastive Query for
Instance-level Instrument Segmentation in Robotic Surgery [60.439434751619736]
We propose TraSeTR, a Track-to-Segment Transformer that exploits tracking cues to assist surgical instrument segmentation.
TraSeTR jointly reasons about the instrument type, location, and identity with instance-level predictions.
The effectiveness of our method is demonstrated with state-of-the-art instrument type segmentation results on three public datasets.
arXiv Detail & Related papers (2022-02-17T05:52:18Z) - Efficient Global-Local Memory for Real-time Instrument Segmentation of
Robotic Surgical Video [53.14186293442669]
We identify two important clues for surgical instrument perception, including local temporal dependency from adjacent frames and global semantic correlation in long-range duration.
We propose a novel dual-memory network (DMNet) to relate both global and local-temporal knowledge.
Our method largely outperforms the state-of-the-art works on segmentation accuracy while maintaining a real-time speed.
arXiv Detail & Related papers (2021-09-28T10:10:14Z) - Learning Motion Flows for Semi-supervised Instrument Segmentation from
Robotic Surgical Video [64.44583693846751]
We study the semi-supervised instrument segmentation from robotic surgical videos with sparse annotations.
By exploiting generated data pairs, our framework can recover and even enhance temporal consistency of training sequences.
Results show that our method outperforms the state-of-the-art semisupervised methods by a large margin.
arXiv Detail & Related papers (2020-07-06T02:39:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.