TSP-OCS: A Time-Series Prediction for Optimal Camera Selection in Multi-Viewpoint Surgical Video Analysis
- URL: http://arxiv.org/abs/2504.06527v1
- Date: Wed, 09 Apr 2025 02:07:49 GMT
- Title: TSP-OCS: A Time-Series Prediction for Optimal Camera Selection in Multi-Viewpoint Surgical Video Analysis
- Authors: Xinyu Liu, Xiaoguang Lin, Xiang Liu, Yong Yang, Hongqian Wang, Qilong Sun,
- Abstract summary: We propose a fully supervised learning-based time series prediction method to choose the best shot sequences from multiple simultaneously recorded video streams.<n>Our method achieves competitive accuracy compared to traditional supervised methods, even when predicting over longer time horizons.
- Score: 19.40791972868592
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recording the open surgery process is essential for educational and medical evaluation purposes; however, traditional single-camera methods often face challenges such as occlusions caused by the surgeon's head and body, as well as limitations due to fixed camera angles, which reduce comprehensibility of the video content. This study addresses these limitations by employing a multi-viewpoint camera recording system, capturing the surgical procedure from six different angles to mitigate occlusions. We propose a fully supervised learning-based time series prediction method to choose the best shot sequences from multiple simultaneously recorded video streams, ensuring optimal viewpoints at each moment. Our time series prediction model forecasts future camera selections by extracting and fusing visual and semantic features from surgical videos using pre-trained models. These features are processed by a temporal prediction network with TimeBlocks to capture sequential dependencies. A linear embedding layer reduces dimensionality, and a Softmax classifier selects the optimal camera view based on the highest probability. In our experiments, we created five groups of open thyroidectomy videos, each with simultaneous recordings from six different angles. The results demonstrate that our method achieves competitive accuracy compared to traditional supervised methods, even when predicting over longer time horizons. Furthermore, our approach outperforms state-of-the-art time series prediction techniques on our dataset. This manuscript makes a unique contribution by presenting an innovative framework that advances surgical video analysis techniques, with significant implications for improving surgical education and patient safety.
Related papers
- One Patient's Annotation is Another One's Initialization: Towards Zero-Shot Surgical Video Segmentation with Cross-Patient Initialization [1.0536099636804035]
Video object segmentation is an emerging technology that is well-suited for real-time surgical video segmentation.<n>However, its adoption is limited by the need for manual intervention to select the tracked object.<n>In this work, we tackle this challenge with an innovative solution: using previously annotated frames from other patients as the tracking frames.<n>We find that this unconventional approach can match or even surpass the performance of using patients' own tracking frames.
arXiv Detail & Related papers (2025-03-04T03:11:03Z) - VISAGE: Video Synthesis using Action Graphs for Surgery [34.21344214645662]
We introduce the novel task of future video generation in laparoscopic surgery.
Our proposed method, VISAGE, leverages the power of action scene graphs to capture the sequential nature of laparoscopic procedures.
Results of our experiments demonstrate high-fidelity video generation for laparoscopy procedures.
arXiv Detail & Related papers (2024-10-23T10:28:17Z) - Predicting Long-horizon Futures by Conditioning on Geometry and Time [49.86180975196375]
We explore the task of generating future sensor observations conditioned on the past.
We leverage the large-scale pretraining of image diffusion models which can handle multi-modality.
We create a benchmark for video prediction on a diverse set of videos spanning indoor and outdoor scenes.
arXiv Detail & Related papers (2024-04-17T16:56:31Z) - Weakly-Supervised Surgical Phase Recognition [19.27227976291303]
In this work we join concepts of graph segmentation with self-supervised learning to derive a random-walk solution for per-frame phase prediction.
We validate our method by running experiments with the public Cholec80 dataset of laparoscopic cholecystectomy videos.
arXiv Detail & Related papers (2023-10-26T07:54:47Z) - GLSFormer : Gated - Long, Short Sequence Transformer for Step
Recognition in Surgical Videos [57.93194315839009]
We propose a vision transformer-based approach to learn temporal features directly from sequence-level patches.
We extensively evaluate our approach on two cataract surgery video datasets, Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods.
arXiv Detail & Related papers (2023-07-20T17:57:04Z) - Next-generation Surgical Navigation: Marker-less Multi-view 6DoF Pose
Estimation of Surgical Instruments [66.74633676595889]
We present a multi-camera capture setup consisting of static and head-mounted cameras.
Second, we publish a multi-view RGB-D video dataset of ex-vivo spine surgeries, captured in a surgical wet lab and a real operating theatre.
Third, we evaluate three state-of-the-art single-view and multi-view methods for the task of 6DoF pose estimation of surgical instruments.
arXiv Detail & Related papers (2023-05-05T13:42:19Z) - Deep Selection: A Fully Supervised Camera Selection Network for Surgery
Recordings [9.242157746114113]
We use a recording system in which multiple cameras are embedded in the surgical lamp.
As the embedded cameras obtain multiple video sequences, we address the task of selecting the camera with the best view of the surgery.
Unlike the conventional method, which selects the camera based on the area size of the surgery field, we propose a deep neural network that predicts the camera selection probability from multiple video sequences.
arXiv Detail & Related papers (2023-03-28T13:00:08Z) - Deep Video Prior for Video Consistency and Propagation [58.250209011891904]
We present a novel and general approach for blind video temporal consistency.
Our method is only trained on a pair of original and processed videos directly instead of a large dataset.
We show that temporal consistency can be achieved by training a convolutional neural network on a video with Deep Video Prior.
arXiv Detail & Related papers (2022-01-27T16:38:52Z) - Unsupervised Shot Boundary Detection for Temporal Segmentation of Long
Capsule Endoscopy Videos [0.0]
Physicians use Capsule Endoscopy (CE) as a non-invasive and non-surgical procedure to examine the entire gastrointestinal (GI) tract.
A single CE examination could last between 8 to 11 hours generating up to 80,000 frames which is compiled as a video.
arXiv Detail & Related papers (2021-10-18T07:22:46Z) - Robust Unsupervised Video Anomaly Detection by Multi-Path Frame
Prediction [61.17654438176999]
We propose a novel and robust unsupervised video anomaly detection method by frame prediction with proper design.
Our proposed method obtains the frame-level AUROC score of 88.3% on the CUHK Avenue dataset.
arXiv Detail & Related papers (2020-11-05T11:34:12Z) - Blind Video Temporal Consistency via Deep Video Prior [61.062900556483164]
We present a novel and general approach for blind video temporal consistency.
Our method is only trained on a pair of original and processed videos directly.
We show that temporal consistency can be achieved by training a convolutional network on a video with the Deep Video Prior.
arXiv Detail & Related papers (2020-10-22T16:19:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.