Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase
Recognition, and Irregularity Detection
- URL: http://arxiv.org/abs/2312.06295v1
- Date: Mon, 11 Dec 2023 10:53:05 GMT
- Title: Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase
Recognition, and Irregularity Detection
- Authors: Negin Ghamsarian, Yosuf El-Shabrawi, Sahar Nasirihaghighi, Doris
Putzgruber-Adamitsch, Martin Zinkernagel, Sebastian Wolf, Klaus Schoeffmann,
Raphael Sznitman
- Abstract summary: We present the largest cataract surgery video dataset that addresses diverse requisites for constructing computerized surgical workflow analysis.
We validate the quality of annotations by benchmarking the performance of several state-of-the-art neural network architectures.
The dataset and annotations will be publicly available upon acceptance of the paper.
- Score: 5.47960852753243
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In recent years, the landscape of computer-assisted interventions and
post-operative surgical video analysis has been dramatically reshaped by
deep-learning techniques, resulting in significant advancements in surgeons'
skills, operation room management, and overall surgical outcomes. However, the
progression of deep-learning-powered surgical technologies is profoundly
reliant on large-scale datasets and annotations. Particularly, surgical scene
understanding and phase recognition stand as pivotal pillars within the realm
of computer-assisted surgery and post-operative assessment of cataract surgery
videos. In this context, we present the largest cataract surgery video dataset
that addresses diverse requisites for constructing computerized surgical
workflow analysis and detecting post-operative irregularities in cataract
surgery. We validate the quality of annotations by benchmarking the performance
of several state-of-the-art neural network architectures for phase recognition
and surgical scene segmentation. Besides, we initiate the research on domain
adaptation for instrument segmentation in cataract surgery by evaluating
cross-domain instrument segmentation performance in cataract surgery videos.
The dataset and annotations will be publicly available upon acceptance of the
paper.
Related papers
- OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining [55.15365161143354]
OphCLIP is a hierarchical retrieval-augmented vision-language pretraining framework for ophthalmic surgical workflow understanding.
OphCLIP learns both fine-grained and long-term visual representations by aligning short video clips with detailed narrative descriptions and full videos with structured titles.
Our OphCLIP also designs a retrieval-augmented pretraining framework to leverage the underexplored large-scale silent surgical procedure videos.
arXiv Detail & Related papers (2024-11-23T02:53:08Z) - SURGIVID: Annotation-Efficient Surgical Video Object Discovery [42.16556256395392]
We propose an annotation-efficient framework for the semantic segmentation of surgical scenes.
We employ image-based self-supervised object discovery to identify the most salient tools and anatomical structures in surgical videos.
Our unsupervised setup reinforced with only 36 annotation labels indicates comparable localization performance with fully-supervised segmentation models.
arXiv Detail & Related papers (2024-09-12T07:12:20Z) - Thoracic Surgery Video Analysis for Surgical Phase Recognition [0.08706730566331035]
We analyse and evaluate both frame-based and video clipping-based phase recognition on thoracic surgery dataset consisting of 11 classes of phases.
We show that Masked Video Distillation(MVD) exhibits superior performance, achieving a top-1 accuracy of 72.9%, compared to 52.31% achieved by ImageNet ViT.
arXiv Detail & Related papers (2024-06-13T14:47:57Z) - Hypergraph-Transformer (HGT) for Interactive Event Prediction in
Laparoscopic and Robotic Surgery [50.3022015601057]
We propose a predictive neural network that is capable of understanding and predicting critical interactive aspects of surgical workflow from intra-abdominal video.
We verify our approach on established surgical datasets and applications, including the detection and prediction of action triplets.
Our results demonstrate the superiority of our approach compared to unstructured alternatives.
arXiv Detail & Related papers (2024-02-03T00:58:05Z) - Surgical Temporal Action-aware Network with Sequence Regularization for
Phase Recognition [28.52533700429284]
We propose a Surgical Temporal Action-aware Network with sequence Regularization, named STAR-Net, to recognize surgical phases more accurately from input videos.
MS-STA module integrates visual features with spatial and temporal knowledge of surgical actions at the cost of 2D networks.
Our STAR-Net with MS-STA and DSR can exploit visual features of surgical actions with effective regularization, thereby leading to the superior performance of surgical phase recognition.
arXiv Detail & Related papers (2023-11-21T13:43:16Z) - Surgical Phase Recognition in Laparoscopic Cholecystectomy [57.929132269036245]
We propose a Transformer-based method that utilizes calibrated confidence scores for a 2-stage inference pipeline.
Our method outperforms the baseline model on the Cholec80 dataset, and can be applied to a variety of action segmentation methods.
arXiv Detail & Related papers (2022-06-14T22:55:31Z) - Quantification of Robotic Surgeries with Vision-Based Deep Learning [45.165919577877695]
We propose a unified deep learning framework, entitled Roboformer, which operates exclusively on videos recorded during surgery.
We validated our framework on four video-based datasets of two commonly-encountered types of steps within minimally-invasive robotic surgeries.
arXiv Detail & Related papers (2022-05-06T06:08:35Z) - CholecTriplet2021: A benchmark challenge for surgical action triplet
recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos.
We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge.
A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z) - Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action
Localization [7.235239641693831]
In cataract surgery, the operation is performed with the help of a microscope. Since the microscope enables watching real-time surgery by up to two people only, a major part of surgical training is conducted using the recorded videos.
To optimize the training procedure with the video content, the surgeons require an automatic relevance detection approach.
In this paper, a three- module framework is proposed to detect and classify the relevant phase segments in cataract videos.
arXiv Detail & Related papers (2021-04-29T12:01:08Z) - OperA: Attention-Regularized Transformers for Surgical Phase Recognition [46.72897518687539]
We introduce OperA, a transformer-based model that accurately predicts surgical phases from long video sequences.
OperA is thoroughly evaluated on two datasets of laparoscopic cholecystectomy videos, outperforming various state-of-the-art temporal refinement approaches.
arXiv Detail & Related papers (2021-03-05T18:59:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.