Retrieval of surgical phase transitions using reinforcement learning
- URL: http://arxiv.org/abs/2208.00902v1
- Date: Mon, 1 Aug 2022 14:43:15 GMT
- Title: Retrieval of surgical phase transitions using reinforcement learning
- Authors: Yitong Zhang, Sophia Bano, Ann-Sophie Page, Jan Deprest, Danail
Stoyanov, Francisco Vasconcelos
- Abstract summary: We introduce a novel reinforcement learning formulation for offline phase transition retrieval.
By construction, our model does not produce spurious and noisy phase transitions, but contiguous phase blocks.
We compare our method against the recent top-performing frame-based approaches TeCNO and Trans-SVNet.
- Score: 11.130363429095048
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In minimally invasive surgery, surgical workflow segmentation from video
analysis is a well studied topic. The conventional approach defines it as a
multi-class classification problem, where individual video frames are
attributed a surgical phase label.
We introduce a novel reinforcement learning formulation for offline phase
transition retrieval. Instead of attempting to classify every video frame, we
identify the timestamp of each phase transition. By construction, our model
does not produce spurious and noisy phase transitions, but contiguous phase
blocks. We investigate two different configurations of this model. The first
does not require processing all frames in a video (only <60% and <20% of frames
in 2 different applications), while producing results slightly under the
state-of-the-art accuracy. The second configuration processes all video frames,
and outperforms the state-of-the art at a comparable computational cost.
We compare our method against the recent top-performing frame-based
approaches TeCNO and Trans-SVNet on the public dataset Cholec80 and also on an
in-house dataset of laparoscopic sacrocolpopexy. We perform both a frame-based
(accuracy, precision, recall and F1-score) and an event-based (event ratio)
evaluation of our algorithms.
Related papers
- DACAT: Dual-stream Adaptive Clip-aware Time Modeling for Robust Online Surgical Phase Recognition [9.560659134295866]
Surgical phase recognition is a crucial requirement in laparoscopic surgery, enabling various clinical applications like surgical risk forecasting.
We propose DACAT, a novel dual-stream model that adaptively learns clip-aware context information to enhance the temporal relationship.
arXiv Detail & Related papers (2024-09-10T04:58:48Z) - FusionFrames: Efficient Architectural Aspects for Text-to-Video
Generation Pipeline [4.295130967329365]
This paper presents a new two-stage latent diffusion text-to-video generation architecture based on the text-to-image diffusion model.
The design of our model significantly reduces computational costs compared to other masked frame approaches.
We evaluate different configurations of MoVQ-based video decoding scheme to improve consistency and achieve higher PSNR, SSIM, MSE, and LPIPS scores.
arXiv Detail & Related papers (2023-11-22T00:26:15Z) - Weakly-Supervised Surgical Phase Recognition [19.27227976291303]
In this work we join concepts of graph segmentation with self-supervised learning to derive a random-walk solution for per-frame phase prediction.
We validate our method by running experiments with the public Cholec80 dataset of laparoscopic cholecystectomy videos.
arXiv Detail & Related papers (2023-10-26T07:54:47Z) - Video-SwinUNet: Spatio-temporal Deep Learning Framework for VFSS
Instance Segmentation [10.789826145990016]
This paper presents a deep learning framework for medical video segmentation.
Our framework explicitly extracts features from neighbouring frames across the temporal dimension.
It incorporates them with a temporal feature blender, which then tokenises the high-level-temporal feature to form a strong global feature encoded via a Swin Transformer.
arXiv Detail & Related papers (2023-02-22T12:09:39Z) - Inductive and Transductive Few-Shot Video Classification via Appearance
and Temporal Alignments [17.673345523918947]
We present a novel method for few-shot video classification, which performs appearance and temporal alignments.
Our approach achieves similar or better results than previous methods on both datasets.
arXiv Detail & Related papers (2022-07-21T23:28:52Z) - Pseudo-label Guided Cross-video Pixel Contrast for Robotic Surgical
Scene Segmentation with Limited Annotations [72.15956198507281]
We propose PGV-CL, a novel pseudo-label guided cross-video contrast learning method to boost scene segmentation.
We extensively evaluate our method on a public robotic surgery dataset EndoVis18 and a public cataract dataset CaDIS.
arXiv Detail & Related papers (2022-07-20T05:42:19Z) - TTVFI: Learning Trajectory-Aware Transformer for Video Frame
Interpolation [50.49396123016185]
Video frame (VFI) aims to synthesize an intermediate frame between two consecutive frames.
We propose a novel Trajectory-aware Transformer for Video Frame Interpolation (TTVFI)
Our method outperforms other state-of-the-art methods in four widely-used VFI benchmarks.
arXiv Detail & Related papers (2022-07-19T03:37:49Z) - Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene
Segmentation [58.74791043631219]
We propose a novel framework STswinCL that explores the complementary intra- and inter-video relations to boost segmentation performance.
We extensively validate our approach on two public surgical video benchmarks, including EndoVis18 Challenge and CaDIS dataset.
Experimental results demonstrate the promising performance of our method, which consistently exceeds previous state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-29T05:52:23Z) - Efficient Video Segmentation Models with Per-frame Inference [117.97423110566963]
We focus on improving the temporal consistency without introducing overhead in inference.
We propose several techniques to learn from the video sequence, including a temporal consistency loss and online/offline knowledge distillation methods.
arXiv Detail & Related papers (2022-02-24T23:51:36Z) - OCSampler: Compressing Videos to One Clip with Single-step Sampling [82.0417131211353]
We propose a framework named OCSampler to explore a compact yet effective video representation with one short clip.
Our basic motivation is that the efficient video recognition task lies in processing a whole sequence at once rather than picking up frames sequentially.
arXiv Detail & Related papers (2022-01-12T09:50:38Z) - Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process.
We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.