Action Recognition in Video Recordings from Gynecologic Laparoscopy
- URL: http://arxiv.org/abs/2311.18666v1
- Date: Thu, 30 Nov 2023 16:15:46 GMT
- Title: Action Recognition in Video Recordings from Gynecologic Laparoscopy
- Authors: Sahar Nasirihaghighi, Negin Ghamsarian, Daniela Stefanics, Klaus
Schoeffmann, Heinrich Husslein
- Abstract summary: Action recognition is a prerequisite for many applications in laparoscopic video analysis.
In this study, we design and evaluate a CNN-RNN architecture as well as a customized training-inference framework.
- Score: 4.002010889177872
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Action recognition is a prerequisite for many applications in laparoscopic
video analysis including but not limited to surgical training, operation room
planning, follow-up surgery preparation, post-operative surgical assessment,
and surgical outcome estimation. However, automatic action recognition in
laparoscopic surgeries involves numerous challenges such as (I) cross-action
and intra-action duration variation, (II) relevant content distortion due to
smoke, blood accumulation, fast camera motions, organ movements, object
occlusion, and (III) surgical scene variations due to different illuminations
and viewpoints. Besides, action annotations in laparoscopy surgeries are
limited and expensive due to requiring expert knowledge. In this study, we
design and evaluate a CNN-RNN architecture as well as a customized
training-inference framework to deal with the mentioned challenges in
laparoscopic surgery action recognition. Using stacked recurrent layers, our
proposed network takes advantage of inter-frame dependencies to negate the
negative effect of content distortion and variation in action recognition.
Furthermore, our proposed frame sampling strategy effectively manages the
duration variations in surgical actions to enable action recognition with high
temporal resolution. Our extensive experiments confirm the superiority of our
proposed method in action recognition compared to static CNNs.
Related papers
- Hypergraph-Transformer (HGT) for Interactive Event Prediction in
Laparoscopic and Robotic Surgery [50.3022015601057]
We propose a predictive neural network that is capable of understanding and predicting critical interactive aspects of surgical workflow from intra-abdominal video.
We verify our approach on established surgical datasets and applications, including the detection and prediction of action triplets.
Our results demonstrate the superiority of our approach compared to unstructured alternatives.
arXiv Detail & Related papers (2024-02-03T00:58:05Z) - Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase
Recognition, and Irregularity Detection [5.47960852753243]
We present the largest cataract surgery video dataset that addresses diverse requisites for constructing computerized surgical workflow analysis.
We validate the quality of annotations by benchmarking the performance of several state-of-the-art neural network architectures.
The dataset and annotations will be publicly available upon acceptance of the paper.
arXiv Detail & Related papers (2023-12-11T10:53:05Z) - Event Recognition in Laparoscopic Gynecology Videos with Hybrid
Transformers [4.371909393924804]
We introduce a dataset tailored for relevant event recognition in laparoscopic videos.
Our dataset includes annotations for critical events associated with major intra-operative challenges and post-operative complications.
We evaluate a hybrid transformer architecture coupled with a customized training-inference framework to recognize four specific events in laparoscopic surgery videos.
arXiv Detail & Related papers (2023-12-01T13:57:29Z) - GLSFormer : Gated - Long, Short Sequence Transformer for Step
Recognition in Surgical Videos [57.93194315839009]
We propose a vision transformer-based approach to learn temporal features directly from sequence-level patches.
We extensively evaluate our approach on two cataract surgery video datasets, Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods.
arXiv Detail & Related papers (2023-07-20T17:57:04Z) - Learning How To Robustly Estimate Camera Pose in Endoscopic Videos [5.073761189475753]
We propose a solution for stereo endoscopes that estimates depth and optical flow to minimize two geometric losses for camera pose estimation.
Most importantly, we introduce two learned adaptive per-pixel weight mappings that balance contributions according to the input image content.
We validate our approach on the publicly available SCARED dataset and introduce a new in-vivo dataset, StereoMIS.
arXiv Detail & Related papers (2023-04-17T07:05:01Z) - Live image-based neurosurgical guidance and roadmap generation using
unsupervised embedding [53.992124594124896]
We present a method for live image-only guidance leveraging a large data set of annotated neurosurgical videos.
A generated roadmap encodes the common anatomical paths taken in surgeries in the training set.
We trained and evaluated the proposed method with a data set of 166 transsphenoidal adenomectomy procedures.
arXiv Detail & Related papers (2023-03-31T12:52:24Z) - Quantification of Robotic Surgeries with Vision-Based Deep Learning [45.165919577877695]
We propose a unified deep learning framework, entitled Roboformer, which operates exclusively on videos recorded during surgery.
We validated our framework on four video-based datasets of two commonly-encountered types of steps within minimally-invasive robotic surgeries.
arXiv Detail & Related papers (2022-05-06T06:08:35Z) - CholecTriplet2021: A benchmark challenge for surgical action triplet
recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos.
We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge.
A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z) - LensID: A CNN-RNN-Based Framework Towards Lens Irregularity Detection in
Cataract Surgery Videos [6.743968799949719]
A critical complication after cataract surgery is the dislocation of the lens implant leading to vision deterioration and eye trauma.
We propose an end-to-end recurrent neural network to recognize the lens-implantation phase and a novel semantic segmentation network to segment the lens and pupil after the implantation phase.
arXiv Detail & Related papers (2021-07-02T07:27:29Z) - Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action
Localization [7.235239641693831]
In cataract surgery, the operation is performed with the help of a microscope. Since the microscope enables watching real-time surgery by up to two people only, a major part of surgical training is conducted using the recorded videos.
To optimize the training procedure with the video content, the surgeons require an automatic relevance detection approach.
In this paper, a three- module framework is proposed to detect and classify the relevant phase segments in cataract videos.
arXiv Detail & Related papers (2021-04-29T12:01:08Z) - One-shot action recognition towards novel assistive therapies [63.23654147345168]
This work is motivated by the automated analysis of medical therapies that involve action imitation games.
The presented approach incorporates a pre-processing step that standardizes heterogeneous motion data conditions.
We evaluate the approach on a real use-case of automated video analysis for therapy support with autistic people.
arXiv Detail & Related papers (2021-02-17T19:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.