Related papers: Vision-Based Neurosurgical Guidance: Unsupervised Localization and Camera-Pose Prediction

Vision-Based Neurosurgical Guidance: Unsupervised Localization and Camera-Pose Prediction

URL: http://arxiv.org/abs/2405.09355v1
Date: Wed, 15 May 2024 14:09:11 GMT
Title: Vision-Based Neurosurgical Guidance: Unsupervised Localization and Camera-Pose Prediction
Authors: Gary Sarwin, Alessandro Carretta, Victor Staartjes, Matteo Zoli, Diego Mazzatenta, Luca Regli, Carlo Serra, Ender Konukoglu,
Abstract summary: Localizing oneself during endoscopic procedures can be problematic due to the lack of distinguishable textures and landmarks. We present a deep learning method based on anatomy recognition, that constructs a surgical path in an unsupervised manner from surgical videos.
Score: 41.91807060434709
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Localizing oneself during endoscopic procedures can be problematic due to the lack of distinguishable textures and landmarks, as well as difficulties due to the endoscopic device such as a limited field of view and challenging lighting conditions. Expert knowledge shaped by years of experience is required for localization within the human body during endoscopic procedures. In this work, we present a deep learning method based on anatomy recognition, that constructs a surgical path in an unsupervised manner from surgical videos, modelling relative location and variations due to different viewing angles. At inference time, the model can map an unseen video's frames on the path and estimate the viewing angle, aiming to provide guidance, for instance, to reach a particular destination. We test the method on a dataset consisting of surgical videos of transsphenoidal adenomectomies, as well as on a synthetic dataset. An online tool that lets researchers upload their surgical videos to obtain anatomy detections and the weights of the trained YOLOv7 model are available at: https://surgicalvision.bmic.ethz.ch.

Related papers

How Far Are Surgeons from Surgical World Models? A Pilot Study on Zero-shot Surgical Video Generation with Expert Assessment [69.13598421861654]
We present SurgVeo, the first expert-curated benchmark for video generation model evaluation in surgery.<n>We task the advanced Veo-3 model with a zero-shot prediction task on surgical clips from laparoscopic and neurosurgical procedures.<n>Our results reveal a distinct "plausibility gap": while Veo-3 achieves exceptional Visual Perceptual Plausibility, it fails critically at higher levels of the Surgical Plausibility Pyramid.
arXiv Detail & Related papers (2025-11-03T17:28:54Z)
Anatomy Might Be All You Need: Forecasting What to Do During Surgery [41.91807060434709]
There has been growing interest in providing live guidance by analyzing video feeds from tools such as endoscopes. This work aims to provide guidance on a finer scale, aiming to provide guidance by forecasting the trajectory of the surgical instrument.
arXiv Detail & Related papers (2025-01-29T21:54:31Z)
Intraoperative Registration by Cross-Modal Inverse Neural Rendering [61.687068931599846]
We present a novel approach for 3D/2D intraoperative registration during neurosurgery via cross-modal inverse neural rendering. Our approach separates implicit neural representation into two components, handling anatomical structure preoperatively and appearance intraoperatively. We tested our method on retrospective patients' data from clinical cases, showing that our method outperforms state-of-the-art while meeting current clinical standards for registration.
arXiv Detail & Related papers (2024-09-18T13:40:59Z)
SURGIVID: Annotation-Efficient Surgical Video Object Discovery [42.16556256395392]
We propose an annotation-efficient framework for the semantic segmentation of surgical scenes. We employ image-based self-supervised object discovery to identify the most salient tools and anatomical structures in surgical videos. Our unsupervised setup reinforced with only 36 annotation labels indicates comparable localization performance with fully-supervised segmentation models.
arXiv Detail & Related papers (2024-09-12T07:12:20Z)
FLex: Joint Pose and Dynamic Radiance Fields Optimization for Stereo Endoscopic Videos [79.50191812646125]
Reconstruction of endoscopic scenes is an important asset for various medical applications, from post-surgery analysis to educational training. We adress the challenging setup of a moving endoscope within a highly dynamic environment of deforming tissue. We propose an implicit scene separation into multiple overlapping 4D neural radiance fields (NeRFs) and a progressive optimization scheme jointly optimizing for reconstruction and camera poses from scratch. This improves the ease-of-use and allows to scale reconstruction capabilities in time to process surgical videos of 5,000 frames and more; an improvement of more than ten times compared to the state of the art while being agnostic to external tracking information
arXiv Detail & Related papers (2024-03-18T19:13:02Z)
BASED: Bundle-Adjusting Surgical Endoscopic Dynamic Video Reconstruction using Neural Radiance Fields [5.773068487121897]
Reconstruction of deformable scenes from endoscopic videos is important for many applications. Our work adopts the Neural Radiance Fields (NeRF) approach to learning 3D implicit representations of scenes. We demonstrate this approach on endoscopic surgical scenes from robotic surgery.
arXiv Detail & Related papers (2023-09-27T00:20:36Z)
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures [50.09187683845788]
Recent advancements in surgical computer vision applications have been driven by vision-only models. These methods rely on manually annotated surgical videos to predict a fixed set of object categories. In this work, we put forward the idea that the surgical video lectures available through open surgical e-learning platforms can provide effective vision and language supervisory signals.
arXiv Detail & Related papers (2023-07-27T22:38:12Z)
Next-generation Surgical Navigation: Marker-less Multi-view 6DoF Pose Estimation of Surgical Instruments [66.74633676595889]
We present a multi-camera capture setup consisting of static and head-mounted cameras. Second, we publish a multi-view RGB-D video dataset of ex-vivo spine surgeries, captured in a surgical wet lab and a real operating theatre. Third, we evaluate three state-of-the-art single-view and multi-view methods for the task of 6DoF pose estimation of surgical instruments.
arXiv Detail & Related papers (2023-05-05T13:42:19Z)
Learning How To Robustly Estimate Camera Pose in Endoscopic Videos [5.073761189475753]
We propose a solution for stereo endoscopes that estimates depth and optical flow to minimize two geometric losses for camera pose estimation. Most importantly, we introduce two learned adaptive per-pixel weight mappings that balance contributions according to the input image content. We validate our approach on the publicly available SCARED dataset and introduce a new in-vivo dataset, StereoMIS.
arXiv Detail & Related papers (2023-04-17T07:05:01Z)
Live image-based neurosurgical guidance and roadmap generation using unsupervised embedding [53.992124594124896]
We present a method for live image-only guidance leveraging a large data set of annotated neurosurgical videos. A generated roadmap encodes the common anatomical paths taken in surgeries in the training set. We trained and evaluated the proposed method with a data set of 166 transsphenoidal adenomectomy procedures.
arXiv Detail & Related papers (2023-03-31T12:52:24Z)
A Temporal Learning Approach to Inpainting Endoscopic Specularities and Its effect on Image Correspondence [13.25903945009516]
We propose using a temporal generative adversarial network (GAN) to inpaint the hidden anatomy under specularities. This is achieved using in-vivo data of gastric endoscopy (Hyper-Kvasir) in a fully unsupervised manner. We also assess the effect of our method in computer vision tasks that underpin 3D reconstruction and camera motion estimation.
arXiv Detail & Related papers (2022-03-31T13:14:00Z)
Towards Unsupervised Learning for Instrument Segmentation in Robotic Surgery with Cycle-Consistent Adversarial Networks [54.00217496410142]
We propose an unpaired image-to-image translation where the goal is to learn the mapping between an input endoscopic image and a corresponding annotation. Our approach allows to train image segmentation models without the need to acquire expensive annotations. We test our proposed method on Endovis 2017 challenge dataset and show that it is competitive with supervised segmentation methods.
arXiv Detail & Related papers (2020-07-09T01:39:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.