CholecTriplet2021: A benchmark challenge for surgical action triplet
recognition
- URL: http://arxiv.org/abs/2204.04746v1
- Date: Sun, 10 Apr 2022 18:51:55 GMT
- Title: CholecTriplet2021: A benchmark challenge for surgical action triplet
recognition
- Authors: Chinedu Innocent Nwoye, Deepak Alapatt, Tong Yu, Armine Vardazaryan,
Fangfang Xia, Zixuan Zhao, Tong Xia, Fucang Jia, Yuxuan Yang, Hao Wang,
Derong Yu, Guoyan Zheng, Xiaotian Duan, Neil Getty, Ricardo Sanchez-Matilla,
Maria Robu, Li Zhang, Huabin Chen, Jiacheng Wang, Liansheng Wang, Bokai
Zhang, Beerend Gerats, Sista Raviteja, Rachana Sathish, Rong Tao, Satoshi
Kondo, Winnie Pang, Hongliang Ren, Julian Ronald Abbing, Mohammad Hasan
Sarhan, Sebastian Bodenstedt, Nithya Bhasker, Bruno Oliveira, Helena R.
Torres, Li Ling, Finn Gaida, Tobias Czempiel, Jo\~ao L. Vila\c{c}a, Pedro
Morais, Jaime Fonseca, Ruby Mae Egging, Inge Nicole Wijma, Chen Qian, Guibin
Bian, Zhen Li, Velmurugan Balasubramanian, Debdoot Sheet, Imanol Luengo,
Yuanbo Zhu, Shuai Ding, Jakob-Anton Aschenbrenner, Nicolas Elini van der Kar,
Mengya Xu, Mobarakol Islam, Lalithkumar Seenivasan, Alexander Jenke, Danail
Stoyanov, Didier Mutter, Pietro Mascagni, Barbara Seeliger, Cristians
Gonzalez, Nicolas Padoy
- Abstract summary: This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos.
We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge.
A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
- Score: 66.51610049869393
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Context-aware decision support in the operating room can foster surgical
safety and efficiency by leveraging real-time feedback from surgical workflow
analysis. Most existing works recognize surgical activities at a coarse-grained
level, such as phases, steps or events, leaving out fine-grained interaction
details about the surgical activity; yet those are needed for more helpful AI
assistance in the operating room. Recognizing surgical actions as triplets of
<instrument, verb, target> combination delivers comprehensive details about the
activities taking place in surgical videos. This paper presents
CholecTriplet2021: an endoscopic vision challenge organized at MICCAI 2021 for
the recognition of surgical action triplets in laparoscopic videos. The
challenge granted private access to the large-scale CholecT50 dataset, which is
annotated with action triplet information. In this paper, we present the
challenge setup and assessment of the state-of-the-art deep learning methods
proposed by the participants during the challenge. A total of 4 baseline
methods from the challenge organizers and 19 new deep learning algorithms by
competing teams are presented to recognize surgical action triplets directly
from surgical videos, achieving mean average precision (mAP) ranging from 4.2%
to 38.1%. This study also analyzes the significance of the results obtained by
the presented approaches, performs a thorough methodological comparison between
them, in-depth result analysis, and proposes a novel ensemble method for
enhanced recognition. Our analysis shows that surgical workflow analysis is not
yet solved, and also highlights interesting directions for future research on
fine-grained surgical activity recognition which is of utmost importance for
the development of AI in surgery.
Related papers
- OSSAR: Towards Open-Set Surgical Activity Recognition in Robot-assisted
Surgery [13.843251369739908]
We introduce an innovative Open-Set Surgical Activity Recognition (OSSAR) framework.
Our solution leverages the hyperspherical reciprocal point strategy to enhance the distinction between known and unknown classes in the feature space.
To support our assertions, we establish an open-set surgical activity benchmark utilizing the public JIGSAWS dataset.
arXiv Detail & Related papers (2024-02-10T16:23:12Z) - Hypergraph-Transformer (HGT) for Interactive Event Prediction in
Laparoscopic and Robotic Surgery [50.3022015601057]
We propose a predictive neural network that is capable of understanding and predicting critical interactive aspects of surgical workflow from intra-abdominal video.
We verify our approach on established surgical datasets and applications, including the detection and prediction of action triplets.
Our results demonstrate the superiority of our approach compared to unstructured alternatives.
arXiv Detail & Related papers (2024-02-03T00:58:05Z) - SAR-RARP50: Segmentation of surgical instrumentation and Action
Recognition on Robot-Assisted Radical Prostatectomy Challenge [72.97934765570069]
We release the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP)
The aim of the challenge is to enable researchers to leverage the scale of the provided dataset and develop robust and highly accurate single-task action recognition and tool segmentation approaches in the surgical domain.
A total of 12 teams participated in the challenge, contributing 7 action recognition methods, 9 instrument segmentation techniques, and 4 multitask approaches that integrated both action recognition and instrument segmentation.
arXiv Detail & Related papers (2023-12-31T13:32:18Z) - CholecTriplet2022: Show me a tool and tell me the triplet -- an
endoscopic vision challenge for surgical action triplet detection [41.66666272822756]
This paper presents the CholecTriplet2022 challenge, which extends surgical action triplet modeling from recognition to detection.
It includes weakly-supervised bounding box localization of every visible surgical instrument (or tool) as the key actors, and the modeling of each tool-activity in the form of instrument, verb, target> triplet.
arXiv Detail & Related papers (2023-02-13T11:53:14Z) - Quantification of Robotic Surgeries with Vision-Based Deep Learning [45.165919577877695]
We propose a unified deep learning framework, entitled Roboformer, which operates exclusively on videos recorded during surgery.
We validated our framework on four video-based datasets of two commonly-encountered types of steps within minimally-invasive robotic surgeries.
arXiv Detail & Related papers (2022-05-06T06:08:35Z) - Comparative Validation of Machine Learning Algorithms for Surgical
Workflow and Skill Analysis with the HeiChole Benchmark [36.37186411201134]
Surgical workflow and skill analysis are key technologies for the next generation of cognitive surgical assistance systems.
We investigated the generalizability of phase recognition algorithms in a multi-center setting.
arXiv Detail & Related papers (2021-09-30T09:34:13Z) - The SARAS Endoscopic Surgeon Action Detection (ESAD) dataset: Challenges
and methods [15.833413083110903]
This paper presents ESAD, the first large-scale dataset designed to tackle the problem of surgeon action detection in endoscopic minimally invasive surgery.
The dataset provides bounding box annotation for 21 action classes on real endoscopic video frames captured during prostatectomy, and was used as the basis of a recent MIDL 2020 challenge.
arXiv Detail & Related papers (2021-04-07T15:11:51Z) - Robust Medical Instrument Segmentation Challenge 2019 [56.148440125599905]
Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions.
Our challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures.
The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap.
arXiv Detail & Related papers (2020-03-23T14:35:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.