Related papers: Action Recognition in Video Recordings from Gynecologic Laparoscopy

Action Recognition in Video Recordings from Gynecologic Laparoscopy

URL: http://arxiv.org/abs/2311.18666v1
Date: Thu, 30 Nov 2023 16:15:46 GMT
Title: Action Recognition in Video Recordings from Gynecologic Laparoscopy
Authors: Sahar Nasirihaghighi, Negin Ghamsarian, Daniela Stefanics, Klaus Schoeffmann, Heinrich Husslein
Abstract summary: Action recognition is a prerequisite for many applications in laparoscopic video analysis. In this study, we design and evaluate a CNN-RNN architecture as well as a customized training-inference framework.
Score: 4.002010889177872
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Action recognition is a prerequisite for many applications in laparoscopic video analysis including but not limited to surgical training, operation room planning, follow-up surgery preparation, post-operative surgical assessment, and surgical outcome estimation. However, automatic action recognition in laparoscopic surgeries involves numerous challenges such as (I) cross-action and intra-action duration variation, (II) relevant content distortion due to smoke, blood accumulation, fast camera motions, organ movements, object occlusion, and (III) surgical scene variations due to different illuminations and viewpoints. Besides, action annotations in laparoscopy surgeries are limited and expensive due to requiring expert knowledge. In this study, we design and evaluate a CNN-RNN architecture as well as a customized training-inference framework to deal with the mentioned challenges in laparoscopic surgery action recognition. Using stacked recurrent layers, our proposed network takes advantage of inter-frame dependencies to negate the negative effect of content distortion and variation in action recognition. Furthermore, our proposed frame sampling strategy effectively manages the duration variations in surgical actions to enable action recognition with high temporal resolution. Our extensive experiments confirm the superiority of our proposed method in action recognition compared to static CNNs.

Related papers

Hypergraph-Transformer (HGT) for Interactive Event Prediction in Laparoscopic and Robotic Surgery [50.3022015601057]
We propose a predictive neural network that is capable of understanding and predicting critical interactive aspects of surgical workflow from intra-abdominal video. We verify our approach on established surgical datasets and applications, including the detection and prediction of action triplets. Our results demonstrate the superiority of our approach compared to unstructured alternatives.
arXiv Detail & Related papers (2024-02-03T00:58:05Z)
Cataract-1K: Cataract Surgery Dataset for Scene Segmentation, Phase Recognition, and Irregularity Detection [5.47960852753243]
We present the largest cataract surgery video dataset that addresses diverse requisites for constructing computerized surgical workflow analysis. We validate the quality of annotations by benchmarking the performance of several state-of-the-art neural network architectures. The dataset and annotations will be publicly available upon acceptance of the paper.
arXiv Detail & Related papers (2023-12-11T10:53:05Z)
Event Recognition in Laparoscopic Gynecology Videos with Hybrid Transformers [4.371909393924804]
We introduce a dataset tailored for relevant event recognition in laparoscopic videos. Our dataset includes annotations for critical events associated with major intra-operative challenges and post-operative complications. We evaluate a hybrid transformer architecture coupled with a customized training-inference framework to recognize four specific events in laparoscopic surgery videos.
arXiv Detail & Related papers (2023-12-01T13:57:29Z)
GLSFormer : Gated - Long, Short Sequence Transformer for Step Recognition in Surgical Videos [57.93194315839009]
We propose a vision transformer-based approach to learn temporal features directly from sequence-level patches. We extensively evaluate our approach on two cataract surgery video datasets, Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods.
arXiv Detail & Related papers (2023-07-20T17:57:04Z)
Learning How To Robustly Estimate Camera Pose in Endoscopic Videos [5.073761189475753]
We propose a solution for stereo endoscopes that estimates depth and optical flow to minimize two geometric losses for camera pose estimation. Most importantly, we introduce two learned adaptive per-pixel weight mappings that balance contributions according to the input image content. We validate our approach on the publicly available SCARED dataset and introduce a new in-vivo dataset, StereoMIS.
arXiv Detail & Related papers (2023-04-17T07:05:01Z)
Live image-based neurosurgical guidance and roadmap generation using unsupervised embedding [53.992124594124896]
We present a method for live image-only guidance leveraging a large data set of annotated neurosurgical videos. A generated roadmap encodes the common anatomical paths taken in surgeries in the training set. We trained and evaluated the proposed method with a data set of 166 transsphenoidal adenomectomy procedures.
arXiv Detail & Related papers (2023-03-31T12:52:24Z)
Quantification of Robotic Surgeries with Vision-Based Deep Learning [45.165919577877695]
We propose a unified deep learning framework, entitled Roboformer, which operates exclusively on videos recorded during surgery. We validated our framework on four video-based datasets of two commonly-encountered types of steps within minimally-invasive robotic surgeries.
arXiv Detail & Related papers (2022-05-06T06:08:35Z)
CholecTriplet2021: A benchmark challenge for surgical action triplet recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z)
LensID: A CNN-RNN-Based Framework Towards Lens Irregularity Detection in Cataract Surgery Videos [6.743968799949719]
A critical complication after cataract surgery is the dislocation of the lens implant leading to vision deterioration and eye trauma. We propose an end-to-end recurrent neural network to recognize the lens-implantation phase and a novel semantic segmentation network to segment the lens and pupil after the implantation phase.
arXiv Detail & Related papers (2021-07-02T07:27:29Z)
Relevance Detection in Cataract Surgery Videos by Spatio-Temporal Action Localization [7.235239641693831]
In cataract surgery, the operation is performed with the help of a microscope. Since the microscope enables watching real-time surgery by up to two people only, a major part of surgical training is conducted using the recorded videos. To optimize the training procedure with the video content, the surgeons require an automatic relevance detection approach. In this paper, a three- module framework is proposed to detect and classify the relevant phase segments in cataract videos.
arXiv Detail & Related papers (2021-04-29T12:01:08Z)
One-shot action recognition towards novel assistive therapies [63.23654147345168]
This work is motivated by the automated analysis of medical therapies that involve action imitation games. The presented approach incorporates a pre-processing step that standardizes heterogeneous motion data conditions. We evaluate the approach on a real use-case of automated video analysis for therapy support with autistic people.
arXiv Detail & Related papers (2021-02-17T19:41:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.