Related papers: Surgeons vs. Computer Vision: A comparative analysis on surgical phase recognition capabilities

Surgeons vs. Computer Vision: A comparative analysis on surgical phase recognition capabilities

URL: http://arxiv.org/abs/2504.18954v1
Date: Sat, 26 Apr 2025 15:37:22 GMT
Title: Surgeons vs. Computer Vision: A comparative analysis on surgical phase recognition capabilities
Authors: Marco Mezzina, Pieter De Backer, Tom Vercauteren, Matthew Blaschko, Alexandre Mottrie, Tinne Tuytelaars,
Abstract summary: Automated Surgical Phase Recognition (SPR) uses Artificial Intelligence (AI) to segment the surgical workflow into its key events.<n>Previous research has focused on short and linear surgical procedures and has not explored if temporal context influences experts' ability to better classify surgical phases.<n>This research addresses these gaps, focusing on Robot-Assisted Partial Nephrectomy (RAPN) as a highly non-linear procedure.
Score: 65.66373425605278
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Purpose: Automated Surgical Phase Recognition (SPR) uses Artificial Intelligence (AI) to segment the surgical workflow into its key events, functioning as a building block for efficient video review, surgical education as well as skill assessment. Previous research has focused on short and linear surgical procedures and has not explored if temporal context influences experts' ability to better classify surgical phases. This research addresses these gaps, focusing on Robot-Assisted Partial Nephrectomy (RAPN) as a highly non-linear procedure. Methods: Urologists of varying expertise were grouped and tasked to indicate the surgical phase for RAPN on both single frames and video snippets using a custom-made web platform. Participants reported their confidence levels and the visual landmarks used in their decision-making. AI architectures without and with temporal context as trained and benchmarked on the Cholec80 dataset were subsequently trained on this RAPN dataset. Results: Video snippets and presence of specific visual landmarks improved phase classification accuracy across all groups. Surgeons displayed high confidence in their classifications and outperformed novices, who struggled discriminating phases. The performance of the AI models is comparable to the surgeons in the survey, with improvements when temporal context was incorporated in both cases. Conclusion: SPR is an inherently complex task for expert surgeons and computer vision, where both perform equally well when given the same context. Performance increases when temporal information is provided. Surgical tools and organs form the key landmarks for human interpretation and are expected to shape the future of automated SPR.

Related papers

Surgical Foundation Model Leveraging Compression and Entropy Maximization for Image-Guided Surgical Assistance [50.486523249499115]
Real-time video understanding is critical to guide procedures in minimally invasive surgery (MIS)<n>We propose Compress-to-Explore (C2E), a novel self-supervised framework to learn compact, informative representations from surgical videos.<n>C2E uses entropy-maximizing decoders to compress images while preserving clinically relevant details, improving encoder performance without labeled data.
arXiv Detail & Related papers (2025-05-16T14:02:24Z)
OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining [60.75854609803651]
OphCLIP is a hierarchical retrieval-augmented vision-language pretraining framework for ophthalmic surgical workflow understanding.<n>OphCLIP learns both fine-grained and long-term visual representations by aligning short video clips with detailed narrative descriptions and full videos with structured titles.<n>Our OphCLIP also designs a retrieval-augmented pretraining framework to leverage the underexplored large-scale silent surgical procedure videos.
arXiv Detail & Related papers (2024-11-23T02:53:08Z)
Peri-AIIMS: Perioperative Artificial Intelligence Driven Integrated Modeling of Surgeries using Anesthetic, Physical and Cognitive Statuses for Predicting Hospital Outcomes [12.493423568689801]
preoperative cognitive status as measured by the clock drawing test contributed to predicting length of hospital stay, hospital charges, average pain experienced during follow-up. Machine learning models were trained to classify postoperative outcomes in hold-out test sets.
arXiv Detail & Related papers (2024-10-29T23:42:51Z)
Automated Surgical Skill Assessment in Endoscopic Pituitary Surgery using Real-time Instrument Tracking on a High-fidelity Bench-top Phantom [9.41936397281689]
Improved surgical skill is generally associated with improved patient outcomes, but assessment is subjective and labour-intensive. A new public dataset is introduced, focusing on simulated surgery, using the nasal phase of endoscopic pituitary surgery as an exemplar. A Multilayer Perceptron achieved 87% accuracy in predicting surgical skill level (novice or expert), with the "ratio of total procedure time to instrument visible time" correlated with higher surgical skill.
arXiv Detail & Related papers (2024-09-25T15:27:44Z)
Phase-Specific Augmented Reality Guidance for Microscopic Cataract Surgery Using Long-Short Spatiotemporal Aggregation Transformer [14.568834378003707]
Phaemulsification cataract surgery (PCS) is a routine procedure using a surgical microscope. PCS guidance systems extract valuable information from surgical microscopic videos to enhance proficiency. Existing PCS guidance systems suffer from non-phasespecific guidance, leading to redundant visual information. We propose a novel phase-specific augmented reality (AR) guidance system, which offers tailored AR information corresponding to the recognized surgical phase.
arXiv Detail & Related papers (2023-09-11T02:56:56Z)
GLSFormer : Gated - Long, Short Sequence Transformer for Step Recognition in Surgical Videos [57.93194315839009]
We propose a vision transformer-based approach to learn temporal features directly from sequence-level patches. We extensively evaluate our approach on two cataract surgery video datasets, Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods.
arXiv Detail & Related papers (2023-07-20T17:57:04Z)
TUNeS: A Temporal U-Net with Self-Attention for Video-based Surgical Phase Recognition [1.5237530964650965]
We propose a novel approach that uses attention more effectively and does not require hand-crafted constraints.<n>TuNeS is an efficient and simple temporal model that incorporates self-attention at the core of a convolutional U-Net structure.<n>TuNeS achieves state-of-the-art results on the Cholec80 dataset.
arXiv Detail & Related papers (2023-07-19T14:10:55Z)
Dissecting Self-Supervised Learning Methods for Surgical Computer Vision [51.370873913181605]
Self-Supervised Learning (SSL) methods have begun to gain traction in the general computer vision community. The effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored. We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection.
arXiv Detail & Related papers (2022-07-01T14:17:11Z)
Quantification of Robotic Surgeries with Vision-Based Deep Learning [45.165919577877695]
We propose a unified deep learning framework, entitled Roboformer, which operates exclusively on videos recorded during surgery. We validated our framework on four video-based datasets of two commonly-encountered types of steps within minimally-invasive robotic surgeries.
arXiv Detail & Related papers (2022-05-06T06:08:35Z)
CholecTriplet2021: A benchmark challenge for surgical action triplet recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z)
Automatic Gesture Recognition in Robot-assisted Surgery with Reinforcement Learning and Tree Search [63.07088785532908]
We propose a framework based on reinforcement learning and tree search for joint surgical gesture segmentation and classification. Our framework consistently outperforms the existing methods on the suturing task of JIGSAWS dataset in terms of accuracy, edit score and F1 score.
arXiv Detail & Related papers (2020-02-20T13:12:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.