PitVis-2023 Challenge: Workflow Recognition in videos of Endoscopic Pituitary Surgery
- URL: http://arxiv.org/abs/2409.01184v1
- Date: Mon, 2 Sep 2024 11:38:06 GMT
- Title: PitVis-2023 Challenge: Workflow Recognition in videos of Endoscopic Pituitary Surgery
- Authors: Adrito Das, Danyal Z. Khan, Dimitrios Psychogyios, Yitong Zhang, John G. Hanrahan, Francisco Vasconcelos, You Pang, Zhen Chen, Jinlin Wu, Xiaoyang Zou, Guoyan Zheng, Abdul Qayyum, Moona Mazher, Imran Razzak, Tianbin Li, Jin Ye, Junjun He, Szymon Płotka, Joanna Kaleta, Amine Yamlahi, Antoine Jund, Patrick Godau, Satoshi Kondo, Satoshi Kasai, Kousuke Hirasawa, Dominik Rivoir, Alejandra Pérez, Santiago Rodriguez, Pablo Arbeláez, Danail Stoyanov, Hani J. Marcus, Sophia Bano,
- Abstract summary: The Pituitary Vision (VisVis) 2023 Challenge tasks the community to step and instrument recognition in videos of endoscopic pituitary surgery.
This is a unique task when compared to other minimally invasive surgeries due to the smaller working space.
There were 18-s from 9-teams across 6-countries, using a variety of deep learning models.
- Score: 46.2901962659261
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The field of computer vision applied to videos of minimally invasive surgery is ever-growing. Workflow recognition pertains to the automated recognition of various aspects of a surgery: including which surgical steps are performed; and which surgical instruments are used. This information can later be used to assist clinicians when learning the surgery; during live surgery; and when writing operation notes. The Pituitary Vision (PitVis) 2023 Challenge tasks the community to step and instrument recognition in videos of endoscopic pituitary surgery. This is a unique task when compared to other minimally invasive surgeries due to the smaller working space, which limits and distorts vision; and higher frequency of instrument and step switching, which requires more precise model predictions. Participants were provided with 25-videos, with results presented at the MICCAI-2023 conference as part of the Endoscopic Vision 2023 Challenge in Vancouver, Canada, on 08-Oct-2023. There were 18-submissions from 9-teams across 6-countries, using a variety of deep learning models. A commonality between the top performing models was incorporating spatio-temporal and multi-task methods, with greater than 50% and 10% macro-F1-score improvement over purely spacial single-task models in step and instrument recognition respectively. The PitVis-2023 Challenge therefore demonstrates state-of-the-art computer vision models in minimally invasive surgery are transferable to a new dataset, with surgery specific techniques used to enhance performance, progressing the field further. Benchmark results are provided in the paper, and the dataset is publicly available at: https://doi.org/10.5522/04/26531686.
Related papers
- VISAGE: Video Synthesis using Action Graphs for Surgery [34.21344214645662]
We introduce the novel task of future video generation in laparoscopic surgery.
Our proposed method, VISAGE, leverages the power of action scene graphs to capture the sequential nature of laparoscopic procedures.
Results of our experiments demonstrate high-fidelity video generation for laparoscopy procedures.
arXiv Detail & Related papers (2024-10-23T10:28:17Z) - Automated Surgical Skill Assessment in Endoscopic Pituitary Surgery using Real-time Instrument Tracking on a High-fidelity Bench-top Phantom [9.41936397281689]
Improved surgical skill is generally associated with improved patient outcomes, but assessment is subjective and labour-intensive.
A new public dataset is introduced, focusing on simulated surgery, using the nasal phase of endoscopic pituitary surgery as an exemplar.
A Multilayer Perceptron achieved 87% accuracy in predicting surgical skill level (novice or expert), with the "ratio of total procedure time to instrument visible time" correlated with higher surgical skill.
arXiv Detail & Related papers (2024-09-25T15:27:44Z) - SAR-RARP50: Segmentation of surgical instrumentation and Action
Recognition on Robot-Assisted Radical Prostatectomy Challenge [72.97934765570069]
We release the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP)
The aim of the challenge is to enable researchers to leverage the scale of the provided dataset and develop robust and highly accurate single-task action recognition and tool segmentation approaches in the surgical domain.
A total of 12 teams participated in the challenge, contributing 7 action recognition methods, 9 instrument segmentation techniques, and 4 multitask approaches that integrated both action recognition and instrument segmentation.
arXiv Detail & Related papers (2023-12-31T13:32:18Z) - Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures [51.78027546947034]
Recent advancements in surgical computer vision have been driven by vision-only models, which lack language semantics.
We propose leveraging surgical video lectures from e-learning platforms to provide effective vision and language supervisory signals.
We address surgery-specific linguistic challenges using multiple automatic speech recognition systems for text transcriptions.
arXiv Detail & Related papers (2023-07-27T22:38:12Z) - GLSFormer : Gated - Long, Short Sequence Transformer for Step
Recognition in Surgical Videos [57.93194315839009]
We propose a vision transformer-based approach to learn temporal features directly from sequence-level patches.
We extensively evaluate our approach on two cataract surgery video datasets, Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods.
arXiv Detail & Related papers (2023-07-20T17:57:04Z) - Surgical tool classification and localization: results and methods from
the MICCAI 2022 SurgToolLoc challenge [69.91670788430162]
We present the results of the SurgLoc 2022 challenge.
The goal was to leverage tool presence data as weak labels for machine learning models trained to detect tools.
We conclude by discussing these results in the broader context of machine learning and surgical data science.
arXiv Detail & Related papers (2023-05-11T21:44:39Z) - CholecTriplet2021: A benchmark challenge for surgical action triplet
recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos.
We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge.
A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z) - The SARAS Endoscopic Surgeon Action Detection (ESAD) dataset: Challenges
and methods [15.833413083110903]
This paper presents ESAD, the first large-scale dataset designed to tackle the problem of surgeon action detection in endoscopic minimally invasive surgery.
The dataset provides bounding box annotation for 21 action classes on real endoscopic video frames captured during prostatectomy, and was used as the basis of a recent MIDL 2020 challenge.
arXiv Detail & Related papers (2021-04-07T15:11:51Z) - Surgical Visual Domain Adaptation: Results from the MICCAI 2020
SurgVisDom Challenge [9.986124942784969]
This work seeks to explore the potential for visual domain adaptation in surgery to overcome data privacy concerns.
In particular, we propose to use video from virtual reality (VR) simulations of surgical exercises to develop algorithms to recognize tasks in a clinical-like setting.
We present the performance of the different approaches to solve visual domain adaptation developed by challenge participants.
arXiv Detail & Related papers (2021-02-26T18:45:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.