WeakSurg: Weakly supervised surgical instrument segmentation using temporal equivariance and semantic continuity
- URL: http://arxiv.org/abs/2403.09551v2
- Date: Mon, 30 Sep 2024 07:46:29 GMT
- Title: WeakSurg: Weakly supervised surgical instrument segmentation using temporal equivariance and semantic continuity
- Authors: Qiyuan Wang, Yanzhe Liu, Shang Zhao, Rong Liu, S. Kevin Zhou,
- Abstract summary: We propose a weakly supervised surgical instrument segmentation with only instrument presence labels.
We take the inherent temporal attributes of surgical video into account and extend a two-stage weakly supervised segmentation paradigm.
Experiments are validated on two surgical video datasets, including one cholecystectomy surgery benchmark and one real robotic left lateral segment liver surgery dataset.
- Score: 14.448593791011204
- License:
- Abstract: For robotic surgical videos, instrument presence annotations are typically recorded with video streams, which offering the potential to reduce the manually annotated costs for segmentation. However, weakly supervised surgical instrument segmentation with only instrument presence labels has been rarely explored in surgical domain due to the highly under-constrained challenges. Temporal properties can enhance representation learning by capturing sequential dependencies and patterns over time even in incomplete supervision situations. From this, we take the inherent temporal attributes of surgical video into account and extend a two-stage weakly supervised segmentation paradigm from different perspectives. Firstly, we make temporal equivariance constraint to enhance pixel-wise temporal consistency between adjacent features. Secondly, we constrain class-aware semantic continuity between global and local regions across temporal dimension. Finally, we generate temporal-enhanced pseudo masks from consecutive frames to suppress irrelevant regions. Extensive experiments are validated on two surgical video datasets, including one cholecystectomy surgery benchmark and one real robotic left lateral segment liver surgery dataset. We annotate instance-wise instrument labels with fixed time-steps which are double checked by a clinician with 3-years experience to evaluate segmentation results. Experimental results demonstrate the promising performances of our method, which consistently achieves comparable or favorable results with previous state-of-the-art approaches.
Related papers
- LACOSTE: Exploiting stereo and temporal contexts for surgical instrument segmentation [14.152207010509763]
We propose a novel LACOSTE model that exploits Location-Agnostic COntexts in Stereo and TEmporal images for improved surgical instrument segmentation.
We extensively validate our approach on three public surgical video datasets.
arXiv Detail & Related papers (2024-09-14T08:17:56Z) - Real-time guidewire tracking and segmentation in intraoperative x-ray [52.51797358201872]
We propose a two-stage deep learning framework for real-time guidewire segmentation and tracking.
In the first stage, a Yolov5 detector is trained, using the original X-ray images as well as synthetic ones, to output the bounding boxes of possible target guidewires.
In the second stage, a novel and efficient network is proposed to segment the guidewire in each detected bounding box.
arXiv Detail & Related papers (2024-04-12T20:39:19Z) - Weakly-Supervised Learning via Multi-Lateral Decoder Branching for Guidewire Segmentation in Robot-Assisted Cardiovascular Catheterization [4.894147633944561]
We propose a weakly-supervised learning method with multi-lateral pseudo labeling for tool segmentation in cardiac angiograms.
We trained the model end-to-end with weakly-annotated data obtained during robotic cardiac catheterization.
Compared to three existing weakly-supervised methods, our approach yielded higher segmentation performance across three different cardiac angiogram data.
arXiv Detail & Related papers (2024-04-11T09:23:44Z) - Visual-Kinematics Graph Learning for Procedure-agnostic Instrument Tip
Segmentation in Robotic Surgeries [29.201385352740555]
We propose a novel visual-kinematics graph learning framework to accurately segment the instrument tip given various surgical procedures.
Specifically, a graph learning framework is proposed to encode relational features of instrument parts from both image and kinematics.
A cross-modal contrastive loss is designed to incorporate robust geometric prior from kinematics to image for tip segmentation.
arXiv Detail & Related papers (2023-09-02T14:52:58Z) - GLSFormer : Gated - Long, Short Sequence Transformer for Step
Recognition in Surgical Videos [57.93194315839009]
We propose a vision transformer-based approach to learn temporal features directly from sequence-level patches.
We extensively evaluate our approach on two cataract surgery video datasets, Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods.
arXiv Detail & Related papers (2023-07-20T17:57:04Z) - LoViT: Long Video Transformer for Surgical Phase Recognition [59.06812739441785]
We present a two-stage method, called Long Video Transformer (LoViT) for fusing short- and long-term temporal information.
Our approach outperforms state-of-the-art methods on the Cholec80 and AutoLaparo datasets consistently.
arXiv Detail & Related papers (2023-05-15T20:06:14Z) - TraSeTR: Track-to-Segment Transformer with Contrastive Query for
Instance-level Instrument Segmentation in Robotic Surgery [60.439434751619736]
We propose TraSeTR, a Track-to-Segment Transformer that exploits tracking cues to assist surgical instrument segmentation.
TraSeTR jointly reasons about the instrument type, location, and identity with instance-level predictions.
The effectiveness of our method is demonstrated with state-of-the-art instrument type segmentation results on three public datasets.
arXiv Detail & Related papers (2022-02-17T05:52:18Z) - Efficient Global-Local Memory for Real-time Instrument Segmentation of
Robotic Surgical Video [53.14186293442669]
We identify two important clues for surgical instrument perception, including local temporal dependency from adjacent frames and global semantic correlation in long-range duration.
We propose a novel dual-memory network (DMNet) to relate both global and local-temporal knowledge.
Our method largely outperforms the state-of-the-art works on segmentation accuracy while maintaining a real-time speed.
arXiv Detail & Related papers (2021-09-28T10:10:14Z) - Learning Motion Flows for Semi-supervised Instrument Segmentation from
Robotic Surgical Video [64.44583693846751]
We study the semi-supervised instrument segmentation from robotic surgical videos with sparse annotations.
By exploiting generated data pairs, our framework can recover and even enhance temporal consistency of training sequences.
Results show that our method outperforms the state-of-the-art semisupervised methods by a large margin.
arXiv Detail & Related papers (2020-07-06T02:39:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.