Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action
Localization
- URL: http://arxiv.org/abs/2104.14280v1
- Date: Thu, 29 Apr 2021 12:01:08 GMT
- Title: Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action
Localization
- Authors: Negin Ghamsarian, Mario Taschwer, Doris Putzgruber-Adamitsch,
Stephanie Sarny, Klaus Schoeffmann
- Abstract summary: In cataract surgery, the operation is performed with the help of a microscope. Since the microscope enables watching real-time surgery by up to two people only, a major part of surgical training is conducted using the recorded videos.
To optimize the training procedure with the video content, the surgeons require an automatic relevance detection approach.
In this paper, a three- module framework is proposed to detect and classify the relevant phase segments in cataract videos.
- Score: 7.235239641693831
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In cataract surgery, the operation is performed with the help of a
microscope. Since the microscope enables watching real-time surgery by up to
two people only, a major part of surgical training is conducted using the
recorded videos. To optimize the training procedure with the video content, the
surgeons require an automatic relevance detection approach. In addition to
relevance-based retrieval, these results can be further used for skill
assessment and irregularity detection in cataract surgery videos. In this
paper, a three-module framework is proposed to detect and classify the relevant
phase segments in cataract videos. Taking advantage of an idle frame
recognition network, the video is divided into idle and action segments. To
boost the performance in relevance detection, the cornea where the relevant
surgical actions are conducted is detected in all frames using Mask R-CNN. The
spatiotemporally localized segments containing higher-resolution information
about the pupil texture and actions, and complementary temporal information
from the same phase are fed into the relevance detection module. This module
consists of four parallel recurrent CNNs being responsible to detect four
relevant phases that have been defined with medical experts. The results will
then be integrated to classify the action phases as irrelevant or one of four
relevant phases. Experimental results reveal that the proposed approach
outperforms static CNNs and different configurations of feature-based and
end-to-end recurrent networks.
Related papers
- Thoracic Surgery Video Analysis for Surgical Phase Recognition [0.08706730566331035]
We analyse and evaluate both frame-based and video clipping-based phase recognition on thoracic surgery dataset consisting of 11 classes of phases.
We show that Masked Video Distillation(MVD) exhibits superior performance, achieving a top-1 accuracy of 72.9%, compared to 52.31% achieved by ImageNet ViT.
arXiv Detail & Related papers (2024-06-13T14:47:57Z) - Real-time guidewire tracking and segmentation in intraoperative x-ray [52.51797358201872]
We propose a two-stage deep learning framework for real-time guidewire segmentation and tracking.
In the first stage, a Yolov5 detector is trained, using the original X-ray images as well as synthetic ones, to output the bounding boxes of possible target guidewires.
In the second stage, a novel and efficient network is proposed to segment the guidewire in each detected bounding box.
arXiv Detail & Related papers (2024-04-12T20:39:19Z) - Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase
Recognition, and Irregularity Detection [5.47960852753243]
We present the largest cataract surgery video dataset that addresses diverse requisites for constructing computerized surgical workflow analysis.
We validate the quality of annotations by benchmarking the performance of several state-of-the-art neural network architectures.
The dataset and annotations will be publicly available upon acceptance of the paper.
arXiv Detail & Related papers (2023-12-11T10:53:05Z) - Event Recognition in Laparoscopic Gynecology Videos with Hybrid
Transformers [4.371909393924804]
We introduce a dataset tailored for relevant event recognition in laparoscopic videos.
Our dataset includes annotations for critical events associated with major intra-operative challenges and post-operative complications.
We evaluate a hybrid transformer architecture coupled with a customized training-inference framework to recognize four specific events in laparoscopic surgery videos.
arXiv Detail & Related papers (2023-12-01T13:57:29Z) - Action Recognition in Video Recordings from Gynecologic Laparoscopy [4.002010889177872]
Action recognition is a prerequisite for many applications in laparoscopic video analysis.
In this study, we design and evaluate a CNN-RNN architecture as well as a customized training-inference framework.
arXiv Detail & Related papers (2023-11-30T16:15:46Z) - Phase-Specific Augmented Reality Guidance for Microscopic Cataract
Surgery Using Long-Short Spatiotemporal Aggregation Transformer [14.568834378003707]
Phaemulsification cataract surgery (PCS) is a routine procedure using a surgical microscope.
PCS guidance systems extract valuable information from surgical microscopic videos to enhance proficiency.
Existing PCS guidance systems suffer from non-phasespecific guidance, leading to redundant visual information.
We propose a novel phase-specific augmented reality (AR) guidance system, which offers tailored AR information corresponding to the recognized surgical phase.
arXiv Detail & Related papers (2023-09-11T02:56:56Z) - GLSFormer : Gated - Long, Short Sequence Transformer for Step
Recognition in Surgical Videos [57.93194315839009]
We propose a vision transformer-based approach to learn temporal features directly from sequence-level patches.
We extensively evaluate our approach on two cataract surgery video datasets, Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods.
arXiv Detail & Related papers (2023-07-20T17:57:04Z) - LoViT: Long Video Transformer for Surgical Phase Recognition [59.06812739441785]
We present a two-stage method, called Long Video Transformer (LoViT) for fusing short- and long-term temporal information.
Our approach outperforms state-of-the-art methods on the Cholec80 and AutoLaparo datasets consistently.
arXiv Detail & Related papers (2023-05-15T20:06:14Z) - CholecTriplet2021: A benchmark challenge for surgical action triplet
recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos.
We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge.
A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z) - LRTD: Long-Range Temporal Dependency based Active Learning for Surgical
Workflow Recognition [67.86810761677403]
We propose a novel active learning method for cost-effective surgical video analysis.
Specifically, we propose a non-local recurrent convolutional network (NL-RCNet), which introduces non-local block to capture the long-range temporal dependency.
We validate our approach on a large surgical video dataset (Cholec80) by performing surgical workflow recognition task.
arXiv Detail & Related papers (2020-04-21T09:21:22Z) - Robust Medical Instrument Segmentation Challenge 2019 [56.148440125599905]
Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions.
Our challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures.
The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap.
arXiv Detail & Related papers (2020-03-23T14:35:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.