EasyVis2: A Real Time Multi-view 3D Visualization System for Laparoscopic Surgery Training Enhanced by a Deep Neural Network YOLOv8-Pose
- URL: http://arxiv.org/abs/2412.16742v2
- Date: Tue, 08 Apr 2025 21:14:22 GMT
- Title: EasyVis2: A Real Time Multi-view 3D Visualization System for Laparoscopic Surgery Training Enhanced by a Deep Neural Network YOLOv8-Pose
- Authors: Yung-Hong Sun, Gefei Shen, Jiangang Chen, Jayer Fernandes, Amber L. Shada, Charles P. Heise, Hongrui Jiang, Yu Hen Hu,
- Abstract summary: EasyVis2 is a system designed to provide hands-free, real-time 3D visualization for laparoscopic surgery.<n>It incorporates a surgical trocar equipped with an array of micro-cameras, which can be inserted into the body cavity to offer a 3D perspective.<n>A specialized deep neural network algorithm, YOLOv8-Pose, is utilized to estimate the position and orientation of surgical instruments in each individual camera view.
- Score: 3.8000041849498127
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: EasyVis2 is a system designed to provide hands-free, real-time 3D visualization for laparoscopic surgery. It incorporates a surgical trocar equipped with an array of micro-cameras, which can be inserted into the body cavity to offer an enhanced field of view and a 3D perspective of the surgical procedure. A specialized deep neural network algorithm, YOLOv8-Pose, is utilized to estimate the position and orientation of surgical instruments in each individual camera view. These multi-view estimates enable the calculation of 3D poses of surgical tools, facilitating the rendering of a 3D surface model of the instruments, overlaid on the background scene, for real-time visualization. This study presents methods for adapting YOLOv8-Pose to the EasyVis2 system, including the development of a tailored training dataset. Experimental results demonstrate that, with an identical number of cameras, the new system improves 3D reconstruction accuracy and reduces computation time. Additionally, the adapted YOLOv8-Pose system shows high accuracy in 2D pose estimation.
Related papers
- Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness [73.72335146374543]
We introduce reconstructive visual instruction tuning with 3D-awareness (Ross3D), which integrates 3D-aware visual supervision into the training procedure.
Ross3D achieves state-of-the-art performance across various 3D scene understanding benchmarks.
arXiv Detail & Related papers (2025-04-02T16:59:55Z) - VGGT: Visual Geometry Grounded Transformer [61.37669770946458]
VGGT is a feed-forward neural network that directly infers all key 3D attributes of a scene.
Network achieves state-of-the-art results in multiple 3D tasks.
arXiv Detail & Related papers (2025-03-14T17:59:47Z) - MT3DNet: Multi-Task learning Network for 3D Surgical Scene Reconstruction [0.0]
In image-assisted minimally invasive surgeries (MIS), understanding surgical scenes is vital for real-time feedback to surgeons.<n>The challenge lies in accurately detecting, segmenting, and estimating the depth of surgical scenes depicted in high-resolution images.<n>A novel Multi-Task Learning (MTL) network is proposed for performing these tasks concurrently.
arXiv Detail & Related papers (2024-12-05T07:07:35Z) - MedTet: An Online Motion Model for 4D Heart Reconstruction [59.74234226055964]
We present a novel approach to reconstruction of 3D cardiac motion from sparse intraoperative data.<n>Existing methods can accurately reconstruct 3D organ geometries from full 3D volumetric imaging.<n>We propose a versatile framework for reconstructing 3D motion from such partial data.
arXiv Detail & Related papers (2024-12-03T17:18:33Z) - Advanced XR-Based 6-DOF Catheter Tracking System for Immersive Cardiac Intervention Training [37.69303106863453]
This paper presents a novel system for real-time 3D tracking and visualization of intracardiac echocardiography (ICE) catheters.
A custom 3D-printed setup captures biplane video of the catheter, while a specialized computer vision algorithm reconstructs its 3D trajectory.
The system's data is integrated into an interactive Unity-based environment, rendered through the Meta Quest 3 XR headset.
arXiv Detail & Related papers (2024-11-04T21:05:40Z) - SLAM assisted 3D tracking system for laparoscopic surgery [22.36252790404779]
This work proposes a real-time monocular 3D tracking algorithm for post-registration tasks.
Experiments from in-vivo and ex-vivo tests demonstrate that the proposed 3D tracking system provides robust 3D tracking.
arXiv Detail & Related papers (2024-09-18T04:00:54Z) - A Review of 3D Reconstruction Techniques for Deformable Tissues in Robotic Surgery [8.909938295090827]
NeRF-based techniques have recently garnered attention for the ability to reconstruct scenes implicitly.
On the other hand, 3D-GS represents scenes explicitly using 3D Gaussians and projects them onto a 2D plane as a replacement for the complex volume rendering in NeRF.
This work explores and reviews state-of-the-art (SOTA) approaches, discussing their innovations and implementation principles.
arXiv Detail & Related papers (2024-08-08T12:51:23Z) - MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements [59.70107451308687]
We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM.
Our method, MM3DGS, addresses the limitations of prior rendering by enabling faster scale awareness, and improved trajectory tracking.
We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit.
arXiv Detail & Related papers (2024-04-01T04:57:41Z) - Neural Fields for 3D Tracking of Anatomy and Surgical Instruments in Monocular Laparoscopic Video Clips [1.339950379203994]
We propose a method for joint tracking of all structures simultaneously on a single 2D monocular video clip.
Due to the small size of instruments, they generally cover a small part of the image only, resulting in decreased tracking accuracy.
We evaluate tracking on video clips laparoscopic cholecystectomies, where we find mean tracking accuracies of 92.4% for anatomical structures and 87.4% for instruments.
arXiv Detail & Related papers (2024-03-28T09:44:20Z) - Creating a Digital Twin of Spinal Surgery: A Proof of Concept [68.37190859183663]
Surgery digitalization is the process of creating a virtual replica of real-world surgery.
We present a proof of concept (PoC) for surgery digitalization that is applied to an ex-vivo spinal surgery.
We employ five RGB-D cameras for dynamic 3D reconstruction of the surgeon, a high-end camera for 3D reconstruction of the anatomy, an infrared stereo camera for surgical instrument tracking, and a laser scanner for 3D reconstruction of the operating room and data fusion.
arXiv Detail & Related papers (2024-03-25T13:09:40Z) - Next-generation Surgical Navigation: Marker-less Multi-view 6DoF Pose
Estimation of Surgical Instruments [66.74633676595889]
We present a multi-camera capture setup consisting of static and head-mounted cameras.
Second, we publish a multi-view RGB-D video dataset of ex-vivo spine surgeries, captured in a surgical wet lab and a real operating theatre.
Third, we evaluate three state-of-the-art single-view and multi-view methods for the task of 6DoF pose estimation of surgical instruments.
arXiv Detail & Related papers (2023-05-05T13:42:19Z) - Neural Rendering for Stereo 3D Reconstruction of Deformable Tissues in
Robotic Surgery [18.150476919815382]
Reconstruction of the soft tissues in robotic surgery from endoscopic stereo videos is important for many applications.
Previous works on this task mainly rely on SLAM-based approaches, which struggle to handle complex surgical scenes.
Inspired by recent progress in neural rendering, we present a novel framework for deformable tissue reconstruction.
arXiv Detail & Related papers (2022-06-30T13:06:27Z) - Stereo Dense Scene Reconstruction and Accurate Laparoscope Localization
for Learning-Based Navigation in Robot-Assisted Surgery [37.14020061063255]
The computation of anatomical information and laparoscope position is a fundamental block of robot-assisted surgical navigation in Minimally Invasive Surgery (MIS)
We propose a learning-driven framework, in which an image-guided laparoscopic localization with 3D reconstructions of complex anatomical structures is hereby achieved.
arXiv Detail & Related papers (2021-10-08T06:12:18Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.