Related papers: Towards Holistic Surgical Scene Understanding

Towards Holistic Surgical Scene Understanding

URL: http://arxiv.org/abs/2212.04582v4
Date: Fri, 26 Jan 2024 04:54:55 GMT
Title: Towards Holistic Surgical Scene Understanding
Authors: Natalia Valderrama, Paola Ruiz Puentes, Isabela Hern\'andez, Nicol\'as Ayobi, Mathilde Verlyk, Jessica Santander, Juan Caicedo, Nicol\'as Fern\'andez, Pablo Arbel\'aez
Abstract summary: We present a new experimental framework towards holistic surgical scene understanding. First, we introduce the Phase, Step, Instrument, and Atomic Visual Action recognition (PSI-AVA) dataset. Second, we present Transformers for Action, Phase, Instrument, and steps Recognition (TAPIR) as a strong baseline for surgical scene understanding.
Score: 1.004785607987398
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Most benchmarks for studying surgical interventions focus on a specific challenge instead of leveraging the intrinsic complementarity among different tasks. In this work, we present a new experimental framework towards holistic surgical scene understanding. First, we introduce the Phase, Step, Instrument, and Atomic Visual Action recognition (PSI-AVA) Dataset. PSI-AVA includes annotations for both long-term (Phase and Step recognition) and short-term reasoning (Instrument detection and novel Atomic Action recognition) in robot-assisted radical prostatectomy videos. Second, we present Transformers for Action, Phase, Instrument, and steps Recognition (TAPIR) as a strong baseline for surgical scene understanding. TAPIR leverages our dataset's multi-level annotations as it benefits from the learned representation on the instrument detection task to improve its classification capacity. Our experimental results in both PSI-AVA and other publicly available databases demonstrate the adequacy of our framework to spur future research on holistic surgical scene understanding.

Related papers

Comparative validation of surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation in endoscopy: Results of the PhaKIR 2024 challenge [27.48982385201173]
We introduce a novel dataset comprising thirteen full-length laparoscopic cholecystectomy videos collected from three medical institutions.<n>Unlike existing datasets, ours enables joint investigation of instrument localization and procedural context within the same data.<n>We report results and findings in accordance with the BIAS guidelines for biomedical image analysis challenges.
arXiv Detail & Related papers (2025-07-22T13:10:42Z)
EndoARSS: Adapting Spatially-Aware Foundation Model for Efficient Activity Recognition and Semantic Segmentation in Endoscopic Surgery [11.286605039002419]
Endoscopic surgery is the gold standard for robotic-assisted minimally invasive surgery.<n>Traditional deep learning models often struggle with cross-activity interference, leading to suboptimal performance in each downstream task.<n>We propose EndoARSS, a novel multi-task learning framework specifically designed for endoscopy surgery activity recognition and semantic segmentation.
arXiv Detail & Related papers (2025-06-07T15:18:43Z)
Surgeons vs. Computer Vision: A comparative analysis on surgical phase recognition capabilities [65.66373425605278]
Automated Surgical Phase Recognition (SPR) uses Artificial Intelligence (AI) to segment the surgical workflow into its key events. Previous research has focused on short and linear surgical procedures and has not explored if temporal context influences experts' ability to better classify surgical phases. This research addresses these gaps, focusing on Robot-Assisted Partial Nephrectomy (RAPN) as a highly non-linear procedure.
arXiv Detail & Related papers (2025-04-26T15:37:22Z)
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation [51.222684687924215]
Surgical video-language pretraining faces unique challenges due to the knowledge domain gap and the scarcity of multi-modal data. We propose a hierarchical knowledge augmentation approach and a novel Procedure-Encoded Surgical Knowledge-Augmented Video-Language Pretraining framework to tackle these issues.
arXiv Detail & Related papers (2024-09-30T22:21:05Z)
SuPRA: Surgical Phase Recognition and Anticipation for Intra-Operative Planning [46.57714869178571]
We propose a dual approach that simultaneously recognises the current surgical phase and predicts upcoming ones. Our novel method, Surgical Phase Recognition and Anticipation (SuPRA), leverages past and current information for accurate intra-operative phase recognition.
arXiv Detail & Related papers (2024-03-10T12:46:33Z)
Hypergraph-Transformer (HGT) for Interactive Event Prediction in Laparoscopic and Robotic Surgery [50.3022015601057]
We propose a predictive neural network that is capable of understanding and predicting critical interactive aspects of surgical workflow from intra-abdominal video. We verify our approach on established surgical datasets and applications, including the detection and prediction of action triplets. Our results demonstrate the superiority of our approach compared to unstructured alternatives.
arXiv Detail & Related papers (2024-02-03T00:58:05Z)
Pixel-Wise Recognition for Holistic Surgical Scene Understanding [31.338288460529046]
This paper presents the Holistic and Multi-Granular Surgical Scene Understanding of Prostatectomies (GraSP) dataset. GraSP is a curated benchmark that models surgical scene understanding as a hierarchy of complementary tasks with varying levels of granularity. We introduce the Transformers for Actions, Phases, Steps, and Instrument (TAPIS) model, a general architecture that combines a global video feature extractor with localized region proposals.
arXiv Detail & Related papers (2024-01-20T09:09:52Z)
SAR-RARP50: Segmentation of surgical instrumentation and Action Recognition on Robot-Assisted Radical Prostatectomy Challenge [72.97934765570069]
We release the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP) The aim of the challenge is to enable researchers to leverage the scale of the provided dataset and develop robust and highly accurate single-task action recognition and tool segmentation approaches in the surgical domain. A total of 12 teams participated in the challenge, contributing 7 action recognition methods, 9 instrument segmentation techniques, and 4 multitask approaches that integrated both action recognition and instrument segmentation.
arXiv Detail & Related papers (2023-12-31T13:32:18Z)
ST(OR)2: Spatio-Temporal Object Level Reasoning for Activity Recognition in the Operating Room [6.132617753806978]
We propose a new sample-efficient and object-based approach for surgical activity recognition in the OR. Our method focuses on the geometric arrangements between clinicians and surgical devices, thus utilizing the significant object interaction dynamics in the OR.
arXiv Detail & Related papers (2023-12-19T15:33:57Z)
Jumpstarting Surgical Computer Vision [2.7396997668655163]
We employ self-supervised learning to flexibly leverage diverse surgical datasets. We study phase recognition and critical view of safety in laparoscopic cholecystectomy and laparoscopic hysterectomy. The composition of pre-training datasets can severely affect the effectiveness of SSL methods for various downstream tasks.
arXiv Detail & Related papers (2023-12-10T18:54:16Z)
Dissecting Self-Supervised Learning Methods for Surgical Computer Vision [51.370873913181605]
Self-Supervised Learning (SSL) methods have begun to gain traction in the general computer vision community. The effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored. We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection.
arXiv Detail & Related papers (2022-07-01T14:17:11Z)
CholecTriplet2021: A benchmark challenge for surgical action triplet recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z)
TeCNO: Surgical Phase Recognition with Multi-Stage Temporal Convolutional Networks [43.95869213955351]
We propose a Multi-Stage Temporal Convolutional Network (MS-TCN) that performs hierarchical prediction refinement for surgical phase recognition. Our method is thoroughly evaluated on two datasets of laparoscopic cholecystectomy videos with and without the use of additional surgical tool information.
arXiv Detail & Related papers (2020-03-24T10:12:30Z)
Robust Medical Instrument Segmentation Challenge 2019 [56.148440125599905]
Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions. Our challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures. The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap.
arXiv Detail & Related papers (2020-03-23T14:35:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.