Related papers: Neural Fields for 3D Tracking of Anatomy and Surgical Instruments in Monocular Laparoscopic Video Clips

Neural Fields for 3D Tracking of Anatomy and Surgical Instruments in Monocular Laparoscopic Video Clips

URL: http://arxiv.org/abs/2403.19265v1
Date: Thu, 28 Mar 2024 09:44:20 GMT
Title: Neural Fields for 3D Tracking of Anatomy and Surgical Instruments in Monocular Laparoscopic Video Clips
Authors: Beerend G. A. Gerats, Jelmer M. Wolterink, Seb P. Mol, Ivo A. M. J. Broeders,
Abstract summary: We propose a method for joint tracking of all structures simultaneously on a single 2D monocular video clip. Due to the small size of instruments, they generally cover a small part of the image only, resulting in decreased tracking accuracy. We evaluate tracking on video clips laparoscopic cholecystectomies, where we find mean tracking accuracies of 92.4% for anatomical structures and 87.4% for instruments.
Score: 1.339950379203994
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Laparoscopic video tracking primarily focuses on two target types: surgical instruments and anatomy. The former could be used for skill assessment, while the latter is necessary for the projection of virtual overlays. Where instrument and anatomy tracking have often been considered two separate problems, in this paper, we propose a method for joint tracking of all structures simultaneously. Based on a single 2D monocular video clip, we train a neural field to represent a continuous spatiotemporal scene, used to create 3D tracks of all surfaces visible in at least one frame. Due to the small size of instruments, they generally cover a small part of the image only, resulting in decreased tracking accuracy. Therefore, we propose enhanced class weighting to improve the instrument tracks. We evaluate tracking on video clips from laparoscopic cholecystectomies, where we find mean tracking accuracies of 92.4% for anatomical structures and 87.4% for instruments. Additionally, we assess the quality of depth maps obtained from the method's scene reconstructions. We show that these pseudo-depths have comparable quality to a state-of-the-art pre-trained depth estimator. On laparoscopic videos in the SCARED dataset, the method predicts depth with an MAE of 2.9 mm and a relative error of 9.2%. These results show the feasibility of using neural fields for monocular 3D reconstruction of laparoscopic scenes.

Related papers

Tracing 3D Anatomy in 2D Strokes: A Multi-Stage Projection Driven Approach to Cervical Spine Fracture Identification [6.060672815761177]
This study explores the viability of 2D projection-based vertebra segmentation for vertebra-level fracture detection in 3D CT volumes.<n>Regions of interest are identified using the YOLOv8 model from all views and combined to approximate the 3D cervical spine area.<n>A DenseNet121-Unet-based multi-label segmentation leveraging variance- and energy-based projections achieves a Dice score of 87.86 percent.
arXiv Detail & Related papers (2026-01-21T18:15:47Z)
EasyVis2: A Real Time Multi-view 3D Visualization System for Laparoscopic Surgery Training Enhanced by a Deep Neural Network YOLOv8-Pose [3.8000041849498127]
EasyVis2 is a system designed to provide hands-free, real-time 3D visualization for laparoscopic surgery. It incorporates a surgical trocar equipped with an array of micro-cameras, which can be inserted into the body cavity to offer a 3D perspective. A specialized deep neural network algorithm, YOLOv8-Pose, is utilized to estimate the position and orientation of surgical instruments in each individual camera view.
arXiv Detail & Related papers (2024-12-21T19:26:19Z)
Seamless Augmented Reality Integration in Arthroscopy: A Pipeline for Articular Reconstruction and Guidance [4.8046407905667206]
arthroscopy is a minimally invasive surgical procedure used to diagnose and treat joint problems. The arthroscope's restricted field of view and lack of depth perception pose challenges in navigating complex articular structures. We present a robust pipeline that incorporates simultaneous localization and mapping, depth estimation, and 3D Gaussian splatting. Our solution offers AR assistance for articular notch measurement and annotation anchoring in a human-in-the-loop manner.
arXiv Detail & Related papers (2024-10-01T04:15:49Z)
SALVE: A 3D Reconstruction Benchmark of Wounds from Consumer-grade Videos [20.69257610322339]
This paper presents a study on 3D wound reconstruction from consumer-grade videos. We introduce the SALVE dataset, comprising video recordings of realistic wound phantoms captured with different cameras. We assess the accuracy and precision of state-of-the-art methods for 3D reconstruction, ranging from traditional photogrammetry pipelines to advanced neural rendering approaches.
arXiv Detail & Related papers (2024-07-29T02:34:51Z)
High-fidelity Endoscopic Image Synthesis by Utilizing Depth-guided Neural Surfaces [18.948630080040576]
We introduce a novel method for colon section reconstruction by leveraging NeuS applied to endoscopic images, supplemented by a single frame of depth map. Our approach demonstrates exceptional accuracy in completely rendering colon sections, even capturing unseen portions of the surface. This breakthrough opens avenues for achieving stable and consistently scaled reconstructions, promising enhanced quality in cancer screening procedures and treatment interventions.
arXiv Detail & Related papers (2024-04-20T18:06:26Z)
FLex: Joint Pose and Dynamic Radiance Fields Optimization for Stereo Endoscopic Videos [79.50191812646125]
Reconstruction of endoscopic scenes is an important asset for various medical applications, from post-surgery analysis to educational training. We adress the challenging setup of a moving endoscope within a highly dynamic environment of deforming tissue. We propose an implicit scene separation into multiple overlapping 4D neural radiance fields (NeRFs) and a progressive optimization scheme jointly optimizing for reconstruction and camera poses from scratch. This improves the ease-of-use and allows to scale reconstruction capabilities in time to process surgical videos of 5,000 frames and more; an improvement of more than ten times compared to the state of the art while being agnostic to external tracking information
arXiv Detail & Related papers (2024-03-18T19:13:02Z)
Multi-task learning with cross-task consistency for improved depth estimation in colonoscopy [0.2995885872626565]
We develop a novel multi-task learning (MTL) approach with a shared encoder and two decoders, namely a surface normal decoder and a depth estimator. We demonstrate an improvement of 14.17% on relative error and 10.4% on $delta_1$ accuracy over the most accurate baseline state-of-the-art BTS approach.
arXiv Detail & Related papers (2023-11-30T16:13:17Z)
LightNeuS: Neural Surface Reconstruction in Endoscopy using Illumination Decline [45.1300649322217]
We propose a new approach to 3D reconstruction from sequences of images acquired by monocular endoscopes.<n>It is based on two key insights. First, endoluminal cavities are watertight, a property naturally enforced by modeling them in terms of a signed distance function.<n>Second, the scene illumination is variable. It comes from the endoscope's light sources and decays with the inverse of the squared distance to the surface.
arXiv Detail & Related papers (2023-09-06T06:41:40Z)
Deep learning network to correct axial and coronal eye motion in 3D OCT retinal imaging [65.47834983591957]
We propose deep learning based neural networks to correct axial and coronal motion artifacts in OCT based on a single scan. The experimental result shows that the proposed method can effectively correct motion artifacts and achieve smaller error than other methods.
arXiv Detail & Related papers (2023-05-27T03:55:19Z)
Next-generation Surgical Navigation: Marker-less Multi-view 6DoF Pose Estimation of Surgical Instruments [66.74633676595889]
We present a multi-camera capture setup consisting of static and head-mounted cameras. Second, we publish a multi-view RGB-D video dataset of ex-vivo spine surgeries, captured in a surgical wet lab and a real operating theatre. Third, we evaluate three state-of-the-art single-view and multi-view methods for the task of 6DoF pose estimation of surgical instruments.
arXiv Detail & Related papers (2023-05-05T13:42:19Z)
Geometry-Aware Attenuation Learning for Sparse-View CBCT Reconstruction [53.93674177236367]
Cone Beam Computed Tomography (CBCT) plays a vital role in clinical imaging. Traditional methods typically require hundreds of 2D X-ray projections to reconstruct a high-quality 3D CBCT image. This has led to a growing interest in sparse-view CBCT reconstruction to reduce radiation doses. We introduce a novel geometry-aware encoder-decoder framework to solve this problem.
arXiv Detail & Related papers (2023-03-26T14:38:42Z)
A Temporal Learning Approach to Inpainting Endoscopic Specularities and Its effect on Image Correspondence [13.25903945009516]
We propose using a temporal generative adversarial network (GAN) to inpaint the hidden anatomy under specularities. This is achieved using in-vivo data of gastric endoscopy (Hyper-Kvasir) in a fully unsupervised manner. We also assess the effect of our method in computer vision tasks that underpin 3D reconstruction and camera motion estimation.
arXiv Detail & Related papers (2022-03-31T13:14:00Z)
Revisiting 3D Context Modeling with Supervised Pre-training for Universal Lesion Detection in CT Slices [48.85784310158493]
We propose a Modified Pseudo-3D Feature Pyramid Network (MP3D FPN) to efficiently extract 3D context enhanced 2D features for universal lesion detection in CT slices. With the novel pre-training method, the proposed MP3D FPN achieves state-of-the-art detection performance on the DeepLesion dataset. The proposed 3D pre-trained weights can potentially be used to boost the performance of other 3D medical image analysis tasks.
arXiv Detail & Related papers (2020-12-16T07:11:16Z)
4D Spatio-Temporal Convolutional Networks for Object Position Estimation in OCT Volumes [69.62333053044712]
3D convolutional neural networks (CNNs) have shown promising performance for pose estimation of a marker object using single OCT images. We extend 3D CNNs to 4D-temporal CNNs to evaluate the impact of additional temporal information for marker object tracking.
arXiv Detail & Related papers (2020-07-02T12:02:20Z)
Appearance Learning for Image-based Motion Estimation in Tomography [60.980769164955454]
In tomographic imaging, anatomical structures are reconstructed by applying a pseudo-inverse forward model to acquired signals. Patient motion corrupts the geometry alignment in the reconstruction process resulting in motion artifacts. We propose an appearance learning approach recognizing the structures of rigid motion independently from the scanned object.
arXiv Detail & Related papers (2020-06-18T09:49:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.