Event Recognition in Laparoscopic Gynecology Videos with Hybrid
Transformers
- URL: http://arxiv.org/abs/2312.00593v1
- Date: Fri, 1 Dec 2023 13:57:29 GMT
- Title: Event Recognition in Laparoscopic Gynecology Videos with Hybrid
Transformers
- Authors: Sahar Nasirihaghighi, Negin Ghamsarian, Heinrich Husslein, Klaus
Schoeffmann
- Abstract summary: We introduce a dataset tailored for relevant event recognition in laparoscopic videos.
Our dataset includes annotations for critical events associated with major intra-operative challenges and post-operative complications.
We evaluate a hybrid transformer architecture coupled with a customized training-inference framework to recognize four specific events in laparoscopic surgery videos.
- Score: 4.371909393924804
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Analyzing laparoscopic surgery videos presents a complex and multifaceted
challenge, with applications including surgical training, intra-operative
surgical complication prediction, and post-operative surgical assessment.
Identifying crucial events within these videos is a significant prerequisite in
a majority of these applications. In this paper, we introduce a comprehensive
dataset tailored for relevant event recognition in laparoscopic gynecology
videos. Our dataset includes annotations for critical events associated with
major intra-operative challenges and post-operative complications. To validate
the precision of our annotations, we assess event recognition performance using
several CNN-RNN architectures. Furthermore, we introduce and evaluate a hybrid
transformer architecture coupled with a customized training-inference framework
to recognize four specific events in laparoscopic surgery videos. Leveraging
the Transformer networks, our proposed architecture harnesses inter-frame
dependencies to counteract the adverse effects of relevant content occlusion,
motion blur, and surgical scene variation, thus significantly enhancing event
recognition accuracy. Moreover, we present a frame sampling strategy designed
to manage variations in surgical scenes and the surgeons' skill level,
resulting in event recognition with high temporal resolution. We empirically
demonstrate the superiority of our proposed methodology in event recognition
compared to conventional CNN-RNN architectures through a series of extensive
experiments.
Related papers
- OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining [55.15365161143354]
OphCLIP is a hierarchical retrieval-augmented vision-language pretraining framework for ophthalmic surgical workflow understanding.
OphCLIP learns both fine-grained and long-term visual representations by aligning short video clips with detailed narrative descriptions and full videos with structured titles.
Our OphCLIP also designs a retrieval-augmented pretraining framework to leverage the underexplored large-scale silent surgical procedure videos.
arXiv Detail & Related papers (2024-11-23T02:53:08Z) - Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation [51.222684687924215]
Surgical video-language pretraining faces unique challenges due to the knowledge domain gap and the scarcity of multi-modal data.
We propose a hierarchical knowledge augmentation approach and a novel Procedure-Encoded Surgical Knowledge-Augmented Video-Language Pretraining framework to tackle these issues.
arXiv Detail & Related papers (2024-09-30T22:21:05Z) - Hypergraph-Transformer (HGT) for Interactive Event Prediction in
Laparoscopic and Robotic Surgery [50.3022015601057]
We propose a predictive neural network that is capable of understanding and predicting critical interactive aspects of surgical workflow from intra-abdominal video.
We verify our approach on established surgical datasets and applications, including the detection and prediction of action triplets.
Our results demonstrate the superiority of our approach compared to unstructured alternatives.
arXiv Detail & Related papers (2024-02-03T00:58:05Z) - Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase
Recognition, and Irregularity Detection [5.47960852753243]
We present the largest cataract surgery video dataset that addresses diverse requisites for constructing computerized surgical workflow analysis.
We validate the quality of annotations by benchmarking the performance of several state-of-the-art neural network architectures.
The dataset and annotations will be publicly available upon acceptance of the paper.
arXiv Detail & Related papers (2023-12-11T10:53:05Z) - Action Recognition in Video Recordings from Gynecologic Laparoscopy [4.002010889177872]
Action recognition is a prerequisite for many applications in laparoscopic video analysis.
In this study, we design and evaluate a CNN-RNN architecture as well as a customized training-inference framework.
arXiv Detail & Related papers (2023-11-30T16:15:46Z) - Phase-Specific Augmented Reality Guidance for Microscopic Cataract
Surgery Using Long-Short Spatiotemporal Aggregation Transformer [14.568834378003707]
Phaemulsification cataract surgery (PCS) is a routine procedure using a surgical microscope.
PCS guidance systems extract valuable information from surgical microscopic videos to enhance proficiency.
Existing PCS guidance systems suffer from non-phasespecific guidance, leading to redundant visual information.
We propose a novel phase-specific augmented reality (AR) guidance system, which offers tailored AR information corresponding to the recognized surgical phase.
arXiv Detail & Related papers (2023-09-11T02:56:56Z) - K-Space-Aware Cross-Modality Score for Synthesized Neuroimage Quality
Assessment [71.27193056354741]
The problem of how to assess cross-modality medical image synthesis has been largely unexplored.
We propose a new metric K-CROSS to spur progress on this challenging problem.
K-CROSS uses a pre-trained multi-modality segmentation network to predict the lesion location.
arXiv Detail & Related papers (2023-07-10T01:26:48Z) - CholecTriplet2021: A benchmark challenge for surgical action triplet
recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos.
We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge.
A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z) - Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action
Localization [7.235239641693831]
In cataract surgery, the operation is performed with the help of a microscope. Since the microscope enables watching real-time surgery by up to two people only, a major part of surgical training is conducted using the recorded videos.
To optimize the training procedure with the video content, the surgeons require an automatic relevance detection approach.
In this paper, a three- module framework is proposed to detect and classify the relevant phase segments in cataract videos.
arXiv Detail & Related papers (2021-04-29T12:01:08Z) - Robust Medical Instrument Segmentation Challenge 2019 [56.148440125599905]
Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions.
Our challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures.
The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap.
arXiv Detail & Related papers (2020-03-23T14:35:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.