Deep Selection: A Fully Supervised Camera Selection Network for Surgery
Recordings
- URL: http://arxiv.org/abs/2303.15947v1
- Date: Tue, 28 Mar 2023 13:00:08 GMT
- Title: Deep Selection: A Fully Supervised Camera Selection Network for Surgery
Recordings
- Authors: Ryo Hachiuma, Tomohiro Shimizu, Hideo Saito, Hiroki Kajita, Yoshifumi
Takatsume
- Abstract summary: We use a recording system in which multiple cameras are embedded in the surgical lamp.
As the embedded cameras obtain multiple video sequences, we address the task of selecting the camera with the best view of the surgery.
Unlike the conventional method, which selects the camera based on the area size of the surgery field, we propose a deep neural network that predicts the camera selection probability from multiple video sequences.
- Score: 9.242157746114113
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recording surgery in operating rooms is an essential task for education and
evaluation of medical treatment. However, recording the desired targets, such
as the surgery field, surgical tools, or doctor's hands, is difficult because
the targets are heavily occluded during surgery. We use a recording system in
which multiple cameras are embedded in the surgical lamp, and we assume that at
least one camera is recording the target without occlusion at any given time.
As the embedded cameras obtain multiple video sequences, we address the task of
selecting the camera with the best view of the surgery. Unlike the conventional
method, which selects the camera based on the area size of the surgery field,
we propose a deep neural network that predicts the camera selection probability
from multiple video sequences by learning the supervision of the expert
annotation. We created a dataset in which six different types of plastic
surgery are recorded, and we provided the annotation of camera switching. Our
experiments show that our approach successfully switched between cameras and
outperformed three baseline methods.
Related papers
- Redundancy-Aware Camera Selection for Indoor Scene Neural Rendering [54.468355408388675]
We build a similarity matrix that incorporates both the spatial diversity of the cameras and the semantic variation of the images.
We apply a diversity-based sampling algorithm to optimize the camera selection.
We also develop a new dataset, IndoorTraj, which includes long and complex camera movements captured by humans in virtual indoor environments.
arXiv Detail & Related papers (2024-09-11T08:36:49Z) - Creating a Digital Twin of Spinal Surgery: A Proof of Concept [68.37190859183663]
Surgery digitalization is the process of creating a virtual replica of real-world surgery.
We present a proof of concept (PoC) for surgery digitalization that is applied to an ex-vivo spinal surgery.
We employ five RGB-D cameras for dynamic 3D reconstruction of the surgeon, a high-end camera for 3D reconstruction of the anatomy, an infrared stereo camera for surgical instrument tracking, and a laser scanner for 3D reconstruction of the operating room and data fusion.
arXiv Detail & Related papers (2024-03-25T13:09:40Z) - Depth Over RGB: Automatic Evaluation of Open Surgery Skills Using Depth
Camera [0.8246494848934447]
This work is intended to show that depth cameras achieve similar results to RGB cameras.
depth cameras offer advantages such as robustness to lighting variations, camera positioning, simplified data compression, and enhanced privacy.
arXiv Detail & Related papers (2024-01-18T15:00:28Z) - WS-SfMLearner: Self-supervised Monocular Depth and Ego-motion Estimation
on Surgical Videos with Unknown Camera Parameters [0.0]
Building an accurate and robust self-supervised depth and camera ego-motion estimation system is gaining more attention from the computer vision community.
In this work, we aimed to build a self-supervised depth and ego-motion estimation system which can predict not only accurate depth maps and camera pose, but also camera intrinsic parameters.
arXiv Detail & Related papers (2023-08-22T20:35:24Z) - Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures [51.78027546947034]
Recent advancements in surgical computer vision have been driven by vision-only models, which lack language semantics.
We propose leveraging surgical video lectures from e-learning platforms to provide effective vision and language supervisory signals.
We address surgery-specific linguistic challenges using multiple automatic speech recognition systems for text transcriptions.
arXiv Detail & Related papers (2023-07-27T22:38:12Z) - Next-generation Surgical Navigation: Marker-less Multi-view 6DoF Pose
Estimation of Surgical Instruments [66.74633676595889]
We present a multi-camera capture setup consisting of static and head-mounted cameras.
Second, we publish a multi-view RGB-D video dataset of ex-vivo spine surgeries, captured in a surgical wet lab and a real operating theatre.
Third, we evaluate three state-of-the-art single-view and multi-view methods for the task of 6DoF pose estimation of surgical instruments.
arXiv Detail & Related papers (2023-05-05T13:42:19Z) - Live image-based neurosurgical guidance and roadmap generation using
unsupervised embedding [53.992124594124896]
We present a method for live image-only guidance leveraging a large data set of annotated neurosurgical videos.
A generated roadmap encodes the common anatomical paths taken in surgeries in the training set.
We trained and evaluated the proposed method with a data set of 166 transsphenoidal adenomectomy procedures.
arXiv Detail & Related papers (2023-03-31T12:52:24Z) - Temporal and Contextual Transformer for Multi-Camera Editing of TV Shows [83.54243912535667]
We first collect a novel benchmark on this setting with four diverse scenarios including concerts, sports games, gala shows, and contests.
It contains 88-hour raw videos that contribute to the 14-hour edited videos.
We propose a new approach temporal and contextual transformer that utilizes clues from historical shots and other views to make shot transition decisions.
arXiv Detail & Related papers (2022-10-17T04:11:23Z) - Deep Homography Estimation in Dynamic Surgical Scenes for Laparoscopic
Camera Motion Extraction [6.56651216023737]
We introduce a method that allows to extract a laparoscope holder's actions from videos of laparoscopic interventions.
We synthetically add camera motion to a newly acquired dataset of camera motion free da Vinci surgery image sequences.
We find our method transfers from our camera motion free da Vinci surgery dataset to videos of laparoscopic interventions, outperforming classical homography estimation approaches in both, precision by 41%, and runtime on a CPU by 43%.
arXiv Detail & Related papers (2021-09-30T13:05:37Z) - Predicting the Timing of Camera Movements From the Kinematics of
Instruments in Robotic-Assisted Surgery Using Artificial Neural Networks [1.0965065178451106]
We propose a predictive approach for anticipating when camera movements will occur using artificial neural networks.
We used the kinematic data of the surgical instruments, which were recorded during robotic-assisted surgical training on porcine models.
We found that the instruments' kinematic data can be used to predict when camera movements will occur, and evaluated the performance on different segment durations and ensemble sizes.
arXiv Detail & Related papers (2021-09-23T07:57:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.