Towards Holistic Surgical Scene Understanding
- URL: http://arxiv.org/abs/2212.04582v4
- Date: Fri, 26 Jan 2024 04:54:55 GMT
- Title: Towards Holistic Surgical Scene Understanding
- Authors: Natalia Valderrama, Paola Ruiz Puentes, Isabela Hern\'andez, Nicol\'as
Ayobi, Mathilde Verlyk, Jessica Santander, Juan Caicedo, Nicol\'as
Fern\'andez, Pablo Arbel\'aez
- Abstract summary: We present a new experimental framework towards holistic surgical scene understanding.
First, we introduce the Phase, Step, Instrument, and Atomic Visual Action recognition (PSI-AVA) dataset.
Second, we present Transformers for Action, Phase, Instrument, and steps Recognition (TAPIR) as a strong baseline for surgical scene understanding.
- Score: 1.004785607987398
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most benchmarks for studying surgical interventions focus on a specific
challenge instead of leveraging the intrinsic complementarity among different
tasks. In this work, we present a new experimental framework towards holistic
surgical scene understanding. First, we introduce the Phase, Step, Instrument,
and Atomic Visual Action recognition (PSI-AVA) Dataset. PSI-AVA includes
annotations for both long-term (Phase and Step recognition) and short-term
reasoning (Instrument detection and novel Atomic Action recognition) in
robot-assisted radical prostatectomy videos. Second, we present Transformers
for Action, Phase, Instrument, and steps Recognition (TAPIR) as a strong
baseline for surgical scene understanding. TAPIR leverages our dataset's
multi-level annotations as it benefits from the learned representation on the
instrument detection task to improve its classification capacity. Our
experimental results in both PSI-AVA and other publicly available databases
demonstrate the adequacy of our framework to spur future research on holistic
surgical scene understanding.
Related papers
- Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation [51.222684687924215]
Surgical video-language pretraining faces unique challenges due to the knowledge domain gap and the scarcity of multi-modal data.
We propose a hierarchical knowledge augmentation approach and a novel Procedure-Encoded Surgical Knowledge-Augmented Video-Language Pretraining framework to tackle these issues.
arXiv Detail & Related papers (2024-09-30T22:21:05Z) - SuPRA: Surgical Phase Recognition and Anticipation for Intra-Operative
Planning [46.57714869178571]
We propose a dual approach that simultaneously recognises the current surgical phase and predicts upcoming ones.
Our novel method, Surgical Phase Recognition and Anticipation (SuPRA), leverages past and current information for accurate intra-operative phase recognition.
arXiv Detail & Related papers (2024-03-10T12:46:33Z) - Hypergraph-Transformer (HGT) for Interactive Event Prediction in
Laparoscopic and Robotic Surgery [50.3022015601057]
We propose a predictive neural network that is capable of understanding and predicting critical interactive aspects of surgical workflow from intra-abdominal video.
We verify our approach on established surgical datasets and applications, including the detection and prediction of action triplets.
Our results demonstrate the superiority of our approach compared to unstructured alternatives.
arXiv Detail & Related papers (2024-02-03T00:58:05Z) - Pixel-Wise Recognition for Holistic Surgical Scene Understanding [31.338288460529046]
This paper presents the Holistic and Multi-Granular Surgical Scene Understanding of Prostatectomies (GraSP) dataset.
GraSP is a curated benchmark that models surgical scene understanding as a hierarchy of complementary tasks with varying levels of granularity.
We introduce the Transformers for Actions, Phases, Steps, and Instrument (TAPIS) model, a general architecture that combines a global video feature extractor with localized region proposals.
arXiv Detail & Related papers (2024-01-20T09:09:52Z) - SAR-RARP50: Segmentation of surgical instrumentation and Action
Recognition on Robot-Assisted Radical Prostatectomy Challenge [72.97934765570069]
We release the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP)
The aim of the challenge is to enable researchers to leverage the scale of the provided dataset and develop robust and highly accurate single-task action recognition and tool segmentation approaches in the surgical domain.
A total of 12 teams participated in the challenge, contributing 7 action recognition methods, 9 instrument segmentation techniques, and 4 multitask approaches that integrated both action recognition and instrument segmentation.
arXiv Detail & Related papers (2023-12-31T13:32:18Z) - ST(OR)2: Spatio-Temporal Object Level Reasoning for Activity Recognition
in the Operating Room [6.132617753806978]
We propose a new sample-efficient and object-based approach for surgical activity recognition in the OR.
Our method focuses on the geometric arrangements between clinicians and surgical devices, thus utilizing the significant object interaction dynamics in the OR.
arXiv Detail & Related papers (2023-12-19T15:33:57Z) - Jumpstarting Surgical Computer Vision [2.7396997668655163]
We employ self-supervised learning to flexibly leverage diverse surgical datasets.
We study phase recognition and critical view of safety in laparoscopic cholecystectomy and laparoscopic hysterectomy.
The composition of pre-training datasets can severely affect the effectiveness of SSL methods for various downstream tasks.
arXiv Detail & Related papers (2023-12-10T18:54:16Z) - Dissecting Self-Supervised Learning Methods for Surgical Computer Vision [51.370873913181605]
Self-Supervised Learning (SSL) methods have begun to gain traction in the general computer vision community.
The effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored.
We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection.
arXiv Detail & Related papers (2022-07-01T14:17:11Z) - CholecTriplet2021: A benchmark challenge for surgical action triplet
recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos.
We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge.
A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z) - TeCNO: Surgical Phase Recognition with Multi-Stage Temporal
Convolutional Networks [43.95869213955351]
We propose a Multi-Stage Temporal Convolutional Network (MS-TCN) that performs hierarchical prediction refinement for surgical phase recognition.
Our method is thoroughly evaluated on two datasets of laparoscopic cholecystectomy videos with and without the use of additional surgical tool information.
arXiv Detail & Related papers (2020-03-24T10:12:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.