Related papers: Phase-Informed Tool Segmentation for Manual Small-Incision Cataract Surgery

Phase-Informed Tool Segmentation for Manual Small-Incision Cataract Surgery

URL: http://arxiv.org/abs/2411.16794v1
Date: Mon, 25 Nov 2024 09:22:42 GMT
Title: Phase-Informed Tool Segmentation for Manual Small-Incision Cataract Surgery
Authors: Bhuvan Sachdeva, Naren Akash, Tajamul Ashraf, Simon Muller, Thomas Schultz, Maximilian W. M. Wintergerst, Niharika Singri Prasad, Kaushik Murali, Mohit Jain,
Abstract summary: Cataract surgery is the most common surgical procedure globally, with a disproportionately higher burden in developing countries. We introduce Cataract-MSICS, the first comprehensive dataset containing 53 surgical videos annotated for 18 surgical phases and 3,527 frames with 13 surgical tools at the pixel level. We present ToolSeg, a novel framework that enhances tool segmentation by introducing a phase-conditional decoder and a simple yet effective semi-supervised setup leveraging pseudo-labels from foundation models.
Score: 5.346116837157231
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Cataract surgery is the most common surgical procedure globally, with a disproportionately higher burden in developing countries. While automated surgical video analysis has been explored in general surgery, its application to ophthalmic procedures remains limited. Existing works primarily focus on Phaco cataract surgery, an expensive technique not accessible in regions where cataract treatment is most needed. In contrast, Manual Small-Incision Cataract Surgery (MSICS) is the preferred low-cost, faster alternative in high-volume settings and for challenging cases. However, no dataset exists for MSICS. To address this gap, we introduce Cataract-MSICS, the first comprehensive dataset containing 53 surgical videos annotated for 18 surgical phases and 3,527 frames with 13 surgical tools at the pixel level. We benchmark this dataset on state-of-the-art models and present ToolSeg, a novel framework that enhances tool segmentation by introducing a phase-conditional decoder and a simple yet effective semi-supervised setup leveraging pseudo-labels from foundation models. Our approach significantly improves segmentation performance, achieving a $23.77\%$ to $38.10\%$ increase in mean Dice scores, with a notable boost for tools that are less prevalent and small. Furthermore, we demonstrate that ToolSeg generalizes to other surgical settings, showcasing its effectiveness on the CaDIS dataset.

Related papers

SurgVLM: A Large Vision-Language Model and Systematic Evaluation Benchmark for Surgical Intelligence [72.10889173696928]
We propose SurgVLM, one of the first large vision-language foundation models for surgical intelligence.<n>We construct a large-scale multimodal surgical database, SurgVLM-DB, spanning more than 16 surgical types and 18 anatomical structures.<n>Building upon this comprehensive dataset, we propose SurgVLM, which is built upon Qwen2.5-VL, and undergoes instruction tuning to 10+ surgical tasks.
arXiv Detail & Related papers (2025-06-03T07:44:41Z)
Surgical Foundation Model Leveraging Compression and Entropy Maximization for Image-Guided Surgical Assistance [50.486523249499115]
Real-time video understanding is critical to guide procedures in minimally invasive surgery (MIS)<n>We propose Compress-to-Explore (C2E), a novel self-supervised framework to learn compact, informative representations from surgical videos.<n>C2E uses entropy-maximizing decoders to compress images while preserving clinically relevant details, improving encoder performance without labeled data.
arXiv Detail & Related papers (2025-05-16T14:02:24Z)
ToolTipNet: A Segmentation-Driven Deep Learning Baseline for Surgical Instrument Tip Detection [15.249490007192964]
In robot-assisted radical prostatectomy, the location of the instrument tip is important to register the ultrasound frame with the laparoscopic camera frame. A long-standing limitation is that the instrument tip position obtained from the da Vinci API is inaccurate and requires hand-eye calibration. We propose a surgical instrument-based tip detection approach that takes the part-level instrument segmentation mask as input.
arXiv Detail & Related papers (2025-04-13T19:27:03Z)
Deep learning approaches to surgical video segmentation and object detection: A Scoping Review [0.0]
We conducted a scoping review of studies on semantic segmentation and object detection of anatomical structures published between 2014 and 2024. The primary objective was to evaluate the state-of-the-art performance of semantic segmentation in surgical videos. The secondary objectives included examining DL models, progress toward clinical applications, and the specific challenges with segmentation of organs/tissues in surgical videos.
arXiv Detail & Related papers (2025-02-23T06:31:23Z)
AMNCutter: Affinity-Attention-Guided Multi-View Normalized Cutter for Unsupervised Surgical Instrument Segmentation [7.594796294925481]
We propose a label-free unsupervised model featuring a novel module named Multi-View Normalized Cutter (m-NCutter) Our model is trained using a graph-cutting loss function that leverages patch affinities for supervision, eliminating the need for pseudo-labels. We conduct comprehensive experiments across multiple SIS datasets to validate our approach's state-of-the-art (SOTA) performance, robustness, and exceptional potential as a pre-trained model.
arXiv Detail & Related papers (2024-11-06T06:33:55Z)
SURGIVID: Annotation-Efficient Surgical Video Object Discovery [42.16556256395392]
We propose an annotation-efficient framework for the semantic segmentation of surgical scenes. We employ image-based self-supervised object discovery to identify the most salient tools and anatomical structures in surgical videos. Our unsupervised setup reinforced with only 36 annotation labels indicates comparable localization performance with fully-supervised segmentation models.
arXiv Detail & Related papers (2024-09-12T07:12:20Z)
CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools [1.7059333957102913]
Existing datasets rely on overly generic tracking formalizations that fail to capture surgical-specific dynamics. We introduce CholecTrack20, a specialized dataset for multi-class, multi-tool tracking in surgical procedures. The dataset comprises 20 full-length surgical videos, annotated at 1 fps, yielding over 35K frames and 65K labeled tool instances.
arXiv Detail & Related papers (2023-12-12T15:18:15Z)
Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase Recognition, and Irregularity Detection [5.47960852753243]
We present the largest cataract surgery video dataset that addresses diverse requisites for constructing computerized surgical workflow analysis. We validate the quality of annotations by benchmarking the performance of several state-of-the-art neural network architectures. The dataset and annotations will be publicly available upon acceptance of the paper.
arXiv Detail & Related papers (2023-12-11T10:53:05Z)
Visual-Kinematics Graph Learning for Procedure-agnostic Instrument Tip Segmentation in Robotic Surgeries [29.201385352740555]
We propose a novel visual-kinematics graph learning framework to accurately segment the instrument tip given various surgical procedures. Specifically, a graph learning framework is proposed to encode relational features of instrument parts from both image and kinematics. A cross-modal contrastive loss is designed to incorporate robust geometric prior from kinematics to image for tip segmentation.
arXiv Detail & Related papers (2023-09-02T14:52:58Z)
SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation [65.52097667738884]
We introduce SurgicalSAM, a novel end-to-end efficient-tuning approach for SAM to integrate surgical-specific information with SAM's pre-trained knowledge for improved generalisation. Specifically, we propose a lightweight prototype-based class prompt encoder for tuning, which directly generates prompt embeddings from class prototypes. In addition, to address the low inter-class variance among surgical instrument categories, we propose contrastive prototype learning.
arXiv Detail & Related papers (2023-08-17T02:51:01Z)
Cross-Dataset Adaptation for Instrument Classification in Cataract Surgery Videos [54.1843419649895]
State-of-the-art models, which perform this task well on a particular dataset, perform poorly when tested on another dataset. We propose a novel end-to-end Unsupervised Domain Adaptation (UDA) method called the Barlow Adaptor. In addition, we introduce a novel loss called the Barlow Feature Alignment Loss (BFAL) which aligns features across different domains.
arXiv Detail & Related papers (2023-07-31T18:14:18Z)
Surgical tool classification and localization: results and methods from the MICCAI 2022 SurgToolLoc challenge [69.91670788430162]
We present the results of the SurgLoc 2022 challenge. The goal was to leverage tool presence data as weak labels for machine learning models trained to detect tools. We conclude by discussing these results in the broader context of machine learning and surgical data science.
arXiv Detail & Related papers (2023-05-11T21:44:39Z)
Dissecting Self-Supervised Learning Methods for Surgical Computer Vision [51.370873913181605]
Self-Supervised Learning (SSL) methods have begun to gain traction in the general computer vision community. The effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored. We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection.
arXiv Detail & Related papers (2022-07-01T14:17:11Z)
CholecTriplet2021: A benchmark challenge for surgical action triplet recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z)
Real-time Instance Segmentation of Surgical Instruments using Attention and Multi-scale Feature Fusion [0.5735035463793008]
Deep learning gives us the opportunity to learn complex environment from large surgery scene environments. In this paper, we use a light-weight single stage segmentation model complemented with a convolutional block attention module. Our approach out-performed top team performances in the ROBUST-MIS challenge with over 44% improvement on both area-based metric MI_DSC and distance-based metric MI_NSD.
arXiv Detail & Related papers (2021-11-09T02:22:01Z)
Robust Medical Instrument Segmentation Challenge 2019 [56.148440125599905]
Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions. Our challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures. The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap.
arXiv Detail & Related papers (2020-03-23T14:35:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.