Related papers: SurgPose: Generalisable Surgical Instrument Pose Estimation using Zero-Shot Learning and Stereo Vision

SurgPose: Generalisable Surgical Instrument Pose Estimation using Zero-Shot Learning and Stereo Vision

URL: http://arxiv.org/abs/2505.11439v1
Date: Fri, 16 May 2025 16:58:03 GMT
Title: SurgPose: Generalisable Surgical Instrument Pose Estimation using Zero-Shot Learning and Stereo Vision
Authors: Utsav Rai, Haozheng Xu, Stamatia Giannarou,
Abstract summary: This paper presents a novel 6 Degrees of Freedom pose estimation pipeline for surgical instruments.<n>We leverage state-of-the-art zero-shot RGB-D models like the FoundationPose and SAM-6D.<n>We set a new benchmark for zero-shot RGB-D pose estimation in Robot-assisted Minimally Invasive Surgery.
Score: 4.749155557874306
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Accurate pose estimation of surgical tools in Robot-assisted Minimally Invasive Surgery (RMIS) is essential for surgical navigation and robot control. While traditional marker-based methods offer accuracy, they face challenges with occlusions, reflections, and tool-specific designs. Similarly, supervised learning methods require extensive training on annotated datasets, limiting their adaptability to new tools. Despite their success in other domains, zero-shot pose estimation models remain unexplored in RMIS for pose estimation of surgical instruments, creating a gap in generalising to unseen surgical tools. This paper presents a novel 6 Degrees of Freedom (DoF) pose estimation pipeline for surgical instruments, leveraging state-of-the-art zero-shot RGB-D models like the FoundationPose and SAM-6D. We advanced these models by incorporating vision-based depth estimation using the RAFT-Stereo method, for robust depth estimation in reflective and textureless environments. Additionally, we enhanced SAM-6D by replacing its instance segmentation module, Segment Anything Model (SAM), with a fine-tuned Mask R-CNN, significantly boosting segmentation accuracy in occluded and complex conditions. Extensive validation reveals that our enhanced SAM-6D surpasses FoundationPose in zero-shot pose estimation of unseen surgical instruments, setting a new benchmark for zero-shot RGB-D pose estimation in RMIS. This work enhances the generalisability of pose estimation for unseen objects and pioneers the application of RGB-D zero-shot methods in RMIS.

Related papers

SurgRIPE challenge: Benchmark of Surgical Robot Instrument Pose Estimation [32.9422323323913]
Vision-based methods for surgical instrument pose estimation provide a practical approach to tool tracking, but they often require markers to be attached to the instruments.<n>Recently, more research has focused on the development of marker-less methods based on deep learning.<n>We introduce the Surgical Robot Instrument Pose Estimation (SurgRIPE) challenge, hosted at the 26th International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) in 2023.<n>The SurgRIPE challenge has successfully established a new benchmark for the field, encouraging further research and development in surgical robot instrument pose estimation.
arXiv Detail & Related papers (2025-01-06T13:02:44Z)
Robotic Arm Platform for Multi-View Image Acquisition and 3D Reconstruction in Minimally Invasive Surgery [40.55055153469741]
This work introduces a robotic arm platform for efficient multi-view image acquisition and precise 3D reconstruction in Minimally invasive surgery settings. We adapted a laparoscope to a robotic arm and captured ex-vivo images of several ovine organs across varying lighting conditions. We employed recently released learning-based feature matchers combined with COLMAP to produce our reconstructions.
arXiv Detail & Related papers (2024-10-15T15:42:30Z)
Realistic Data Generation for 6D Pose Estimation of Surgical Instruments [4.226502078427161]
6D pose estimation of surgical instruments is critical to enable the automatic execution of surgical maneuvers. In household and industrial settings, synthetic data, generated with 3D computer graphics software, has been shown as an alternative to minimize annotation costs. We propose an improved simulation environment for surgical robotics that enables the automatic generation of large and diverse datasets.
arXiv Detail & Related papers (2024-06-11T14:59:29Z)
Advancing 6-DoF Instrument Pose Estimation in Variable X-Ray Imaging Geometries [7.630289691590948]
We propose a general-purpose approach of data acquisition for 6-DoF pose estimation tasks in X-ray systems. The proposed YOLOv5-6D pose model achieves competitive results on public benchmarks whilst being considerably faster at 42 FPS on GPU. The model achieves a 92.41% by the 0.1 ADD-S metric, demonstrating a promising approach for enhancing surgical precision and patient outcomes.
arXiv Detail & Related papers (2024-05-19T21:35:12Z)
Zero123-6D: Zero-shot Novel View Synthesis for RGB Category-level 6D Pose Estimation [66.3814684757376]
This work presents Zero123-6D, the first work to demonstrate the utility of Diffusion Model-based novel-view-synthesizers in enhancing RGB 6D pose estimation at category-level. The outlined method shows reduction in data requirements, removal of the necessity of depth information in zero-shot category-level 6D pose estimation task, and increased performance, quantitatively demonstrated through experiments on the CO3D dataset.
arXiv Detail & Related papers (2024-03-21T10:38:18Z)
Redefining the Laparoscopic Spatial Sense: AI-based Intra- and Postoperative Measurement from Stereoimages [3.2039076408339353]
We develop a novel human-AI-based method for laparoscopic measurements utilizing stereo vision. Based on a holistic qualitative requirements analysis, this work proposes a comprehensive measurement method. Our results outline the potential of our method achieving high accuracies in distance measurements with errors below 1 mm.
arXiv Detail & Related papers (2023-11-16T10:19:04Z)
Safe Deep RL for Intraoperative Planning of Pedicle Screw Placement [61.28459114068828]
We propose an intraoperative planning approach for robotic spine surgery that leverages real-time observation for drill path planning based on Safe Deep Reinforcement Learning (DRL) Our approach was capable of achieving 90% bone penetration with respect to the gold standard (GS) drill planning.
arXiv Detail & Related papers (2023-05-09T11:42:53Z)
Next-generation Surgical Navigation: Marker-less Multi-view 6DoF Pose Estimation of Surgical Instruments [64.59698930334012]
We present a multi-camera capture setup consisting of static and head-mounted cameras.<n>Second, we publish a multi-view RGB-D video dataset of ex-vivo spine surgeries, captured in a surgical wet lab and a real operating theatre.<n>Third, we evaluate three state-of-the-art single-view and multi-view methods for the task of 6DoF pose estimation of surgical instruments.
arXiv Detail & Related papers (2023-05-05T13:42:19Z)
Occlusion-Aware Self-Supervised Monocular 6D Object Pose Estimation [88.8963330073454]
We propose a novel monocular 6D pose estimation approach by means of self-supervised learning. We leverage current trends in noisy student training and differentiable rendering to further self-supervise the model. Our proposed self-supervision outperforms all other methods relying on synthetic data.
arXiv Detail & Related papers (2022-03-19T15:12:06Z)
Progressive Self-Guided Loss for Salient Object Detection [102.35488902433896]
We present a progressive self-guided loss function to facilitate deep learning-based salient object detection in images. Our framework takes advantage of adaptively aggregated multi-scale features to locate and detect salient objects effectively.
arXiv Detail & Related papers (2021-01-07T07:33:38Z)
Self6D: Self-Supervised Monocular 6D Object Pose Estimation [114.18496727590481]
We propose the idea of monocular 6D pose estimation by means of self-supervised learning. We leverage recent advances in neural rendering to further self-supervise the model on unannotated real RGB-D data.
arXiv Detail & Related papers (2020-04-14T13:16:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.