OSSAR: Towards Open-Set Surgical Activity Recognition in Robot-assisted
Surgery
- URL: http://arxiv.org/abs/2402.06985v1
- Date: Sat, 10 Feb 2024 16:23:12 GMT
- Title: OSSAR: Towards Open-Set Surgical Activity Recognition in Robot-assisted
Surgery
- Authors: Long Bai, Guankun Wang, Jie Wang, Xiaoxiao Yang, Huxin Gao, Xin Liang,
An Wang, Mobarakol Islam, Hongliang Ren
- Abstract summary: We introduce an innovative Open-Set Surgical Activity Recognition (OSSAR) framework.
Our solution leverages the hyperspherical reciprocal point strategy to enhance the distinction between known and unknown classes in the feature space.
To support our assertions, we establish an open-set surgical activity benchmark utilizing the public JIGSAWS dataset.
- Score: 13.843251369739908
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the realm of automated robotic surgery and computer-assisted
interventions, understanding robotic surgical activities stands paramount.
Existing algorithms dedicated to surgical activity recognition predominantly
cater to pre-defined closed-set paradigms, ignoring the challenges of
real-world open-set scenarios. Such algorithms often falter in the presence of
test samples originating from classes unseen during training phases. To tackle
this problem, we introduce an innovative Open-Set Surgical Activity Recognition
(OSSAR) framework. Our solution leverages the hyperspherical reciprocal point
strategy to enhance the distinction between known and unknown classes in the
feature space. Additionally, we address the issue of over-confidence in the
closed set by refining model calibration, avoiding misclassification of unknown
classes as known ones. To support our assertions, we establish an open-set
surgical activity benchmark utilizing the public JIGSAWS dataset. Besides, we
also collect a novel dataset on endoscopic submucosal dissection for surgical
activity tasks. Extensive comparisons and ablation experiments on these
datasets demonstrate the significant outperformance of our method over existing
state-of-the-art approaches. Our proposed solution can effectively address the
challenges of real-world surgical scenarios. Our code is publicly accessible at
https://github.com/longbai1006/OSSAR.
Related papers
- ZEAL: Surgical Skill Assessment with Zero-shot Tool Inference Using Unified Foundation Model [0.07143413923310668]
This study introduces ZEAL (surgical skill assessment with Zero-shot surgical tool segmentation with a unifiEd foundAtion modeL)
ZEAL predicts segmentation masks, capturing essential features of both instruments and surroundings.
It produces a surgical skill score, offering an objective measure of proficiency.
arXiv Detail & Related papers (2024-07-03T01:20:56Z) - SAR-RARP50: Segmentation of surgical instrumentation and Action
Recognition on Robot-Assisted Radical Prostatectomy Challenge [72.97934765570069]
We release the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP)
The aim of the challenge is to enable researchers to leverage the scale of the provided dataset and develop robust and highly accurate single-task action recognition and tool segmentation approaches in the surgical domain.
A total of 12 teams participated in the challenge, contributing 7 action recognition methods, 9 instrument segmentation techniques, and 4 multitask approaches that integrated both action recognition and instrument segmentation.
arXiv Detail & Related papers (2023-12-31T13:32:18Z) - ST(OR)2: Spatio-Temporal Object Level Reasoning for Activity Recognition
in the Operating Room [6.132617753806978]
We propose a new sample-efficient and object-based approach for surgical activity recognition in the OR.
Our method focuses on the geometric arrangements between clinicians and surgical devices, thus utilizing the significant object interaction dynamics in the OR.
arXiv Detail & Related papers (2023-12-19T15:33:57Z) - GLSFormer : Gated - Long, Short Sequence Transformer for Step
Recognition in Surgical Videos [57.93194315839009]
We propose a vision transformer-based approach to learn temporal features directly from sequence-level patches.
We extensively evaluate our approach on two cataract surgery video datasets, Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods.
arXiv Detail & Related papers (2023-07-20T17:57:04Z) - Demonstration-Guided Reinforcement Learning with Efficient Exploration
for Task Automation of Surgical Robot [54.80144694888735]
We introduce Demonstration-guided EXploration (DEX), an efficient reinforcement learning algorithm.
Our method estimates expert-like behaviors with higher values to facilitate productive interactions.
Experiments on $10$ surgical manipulation tasks from SurRoL, a comprehensive surgical simulation platform, demonstrate significant improvements.
arXiv Detail & Related papers (2023-02-20T05:38:54Z) - Dissecting Self-Supervised Learning Methods for Surgical Computer Vision [51.370873913181605]
Self-Supervised Learning (SSL) methods have begun to gain traction in the general computer vision community.
The effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored.
We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection.
arXiv Detail & Related papers (2022-07-01T14:17:11Z) - CholecTriplet2021: A benchmark challenge for surgical action triplet
recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos.
We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge.
A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z) - The SARAS Endoscopic Surgeon Action Detection (ESAD) dataset: Challenges
and methods [15.833413083110903]
This paper presents ESAD, the first large-scale dataset designed to tackle the problem of surgeon action detection in endoscopic minimally invasive surgery.
The dataset provides bounding box annotation for 21 action classes on real endoscopic video frames captured during prostatectomy, and was used as the basis of a recent MIDL 2020 challenge.
arXiv Detail & Related papers (2021-04-07T15:11:51Z) - Learning Invariant Representation of Tasks for Robust Surgical State
Estimation [39.515036686428836]
We propose StiseNet, a Surgical Task Invariance State Estimation Network.
StiseNet minimizes the effects of variations in surgical technique and operating environments inherent to RAS datasets.
It is shown to outperform state-of-the-art state estimation methods on three datasets.
arXiv Detail & Related papers (2021-02-18T02:32:50Z) - Robust Medical Instrument Segmentation Challenge 2019 [56.148440125599905]
Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions.
Our challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures.
The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap.
arXiv Detail & Related papers (2020-03-23T14:35:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.