Efficient Surgical Robotic Instrument Pose Reconstruction in Real World Conditions Using Unified Feature Detection
- URL: http://arxiv.org/abs/2510.03532v1
- Date: Fri, 03 Oct 2025 22:03:28 GMT
- Title: Efficient Surgical Robotic Instrument Pose Reconstruction in Real World Conditions Using Unified Feature Detection
- Authors: Zekai Liang, Kazuya Miyata, Xiao Liang, Florian Richter, Michael C. Yip,
- Abstract summary: MIS robots have long kinematic chains and partial visibility of their degrees of freedom in the camera.<n>We propose a novel framework that unifies the detection of geometric primitives through a shared encoding.<n>This architecture detects both keypoints and edges in a single inference and is trained on large-scale synthetic data with projective labeling.
- Score: 21.460727996614704
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate camera-to-robot calibration is essential for any vision-based robotic control system and especially critical in minimally invasive surgical robots, where instruments conduct precise micro-manipulations. However, MIS robots have long kinematic chains and partial visibility of their degrees of freedom in the camera, which introduces challenges for conventional camera-to-robot calibration methods that assume stiff robots with good visibility. Previous works have investigated both keypoint-based and rendering-based approaches to address this challenge in real-world conditions; however, they often struggle with consistent feature detection or have long inference times, neither of which are ideal for online robot control. In this work, we propose a novel framework that unifies the detection of geometric primitives (keypoints and shaft edges) through a shared encoding, enabling efficient pose estimation via projection geometry. This architecture detects both keypoints and edges in a single inference and is trained on large-scale synthetic data with projective labeling. This method is evaluated across both feature detection and pose estimation, with qualitative and quantitative results demonstrating fast performance and state-of-the-art accuracy in challenging surgical environments.
Related papers
- Sight Over Site: Perception-Aware Reinforcement Learning for Efficient Robotic Inspection [57.37596278863949]
In this work, we revisit inspection from a perception-aware perspective.<n>We propose an end-to-end reinforcement learning framework that explicitly incorporates target visibility as the primary objective.<n>We show that our method outperforms existing classical and learning-based navigation approaches.
arXiv Detail & Related papers (2025-09-22T15:14:02Z) - Is Single-View Mesh Reconstruction Ready for Robotics? [78.14584238127338]
We evaluate single-view mesh reconstruction models for their potential in enabling instant digital twin creation for real-time planning and dynamics prediction using physics simulators for robotic manipulation.<n>Our findings highlight critical gaps between computer vision advances and robotics needs, guiding future research at this intersection.
arXiv Detail & Related papers (2025-05-23T14:35:56Z) - Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation [34.47272224723296]
We present Taccel, a high-performance simulation platform that integrates IPC and ABD to model robots, tactile sensors, and objects with both accuracy and unprecedented speed.<n>Unlike previous simulators that operate at sub-real-time speeds with limited parallelization, Taccel provides precise physics simulation and realistic tactile signals.<n>These capabilities position Taccel as a powerful tool for scaling up tactile robotics research and development, potentially transforming how robots interact with and understand their physical environment.
arXiv Detail & Related papers (2025-04-17T12:57:11Z) - ARC-Calib: Autonomous Markerless Camera-to-Robot Calibration via Exploratory Robot Motions [15.004750210002152]
ARC-Calib is a model-based markerless camera-to-robot calibration framework.<n>It is fully autonomous and generalizable across diverse robots.
arXiv Detail & Related papers (2025-03-18T20:03:32Z) - RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training [27.63332596592781]
Vision-based pose estimation of articulated robots with unknown joint angles has applications in collaborative robotics and human-robot interaction tasks.<n>Current frameworks use neural network encoders to extract image features and downstream layers to predict joint angles and robot pose.<n>We introduce RoboPEPP, a method that fuses information about the robot's physical model into the encoder using a masking-based self-supervised embedding-predictive architecture.
arXiv Detail & Related papers (2024-11-26T18:26:17Z) - CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera [18.971816395021488]
Markerless pose estimation methods have eliminated the need for time-consuming physical setups for camera-to-robot calibration.
We propose a novel framework capable of estimating the robot pose with partially visible robot manipulators.
arXiv Detail & Related papers (2024-09-16T16:22:43Z) - Kalib: Easy Hand-Eye Calibration with Reference Point Tracking [52.4190876409222]
Kalib is an automatic hand-eye calibration method that leverages the generalizability of visual foundation models to overcome challenges.<n>During calibration, a kinematic reference point is tracked in the camera coordinate 3D coordinates in the space behind the robot.<n>Kalib's user-friendly design and minimal setup requirements make it a possible solution for continuous operation in unstructured environments.
arXiv Detail & Related papers (2024-08-20T06:03:40Z) - GISR: Geometric Initialization and Silhouette-based Refinement for Single-View Robot Pose and Configuration Estimation [0.0]
GISR is a robot-to-camera pose estimation method that prioritizes execution in real-time.
We evaluate GISR on publicly available data and show that it outperforms existing methods of the same class in terms of both speed and accuracy.
arXiv Detail & Related papers (2024-05-08T08:39:25Z) - Robust Surgical Tool Tracking with Pixel-based Probabilities for
Projected Geometric Primitives [28.857732667640068]
Controlling robotic manipulators via visual feedback requires a known coordinate frame transformation between the robot and the camera.
Uncertainties in mechanical systems as well as camera calibration create errors in this coordinate frame transformation.
We estimate the camera-to-base transform and joint angle measurement errors for surgical robotic tools using an image based insertion-shaft detection algorithm and probabilistic models.
arXiv Detail & Related papers (2024-03-08T00:57:03Z) - EasyHeC: Accurate and Automatic Hand-eye Calibration via Differentiable
Rendering and Space Exploration [49.90228618894857]
We introduce a new approach to hand-eye calibration called EasyHeC, which is markerless, white-box, and delivers superior accuracy and robustness.
We propose to use two key technologies: differentiable rendering-based camera pose optimization and consistency-based joint space exploration.
Our evaluation demonstrates superior performance in synthetic and real-world datasets.
arXiv Detail & Related papers (2023-05-02T03:49:54Z) - Online Body Schema Adaptation through Cost-Sensitive Active Learning [63.84207660737483]
The work was implemented in a simulation environment, using the 7DoF arm of the iCub robot simulator.
A cost-sensitive active learning approach is used to select optimal joint configurations.
The results show cost-sensitive active learning has similar accuracy to the standard active learning approach, while reducing in about half the executed movement.
arXiv Detail & Related papers (2021-01-26T16:01:02Z) - Nothing But Geometric Constraints: A Model-Free Method for Articulated
Object Pose Estimation [89.82169646672872]
We propose an unsupervised vision-based system to estimate the joint configurations of the robot arm from a sequence of RGB or RGB-D images without knowing the model a priori.
We combine a classical geometric formulation with deep learning and extend the use of epipolar multi-rigid-body constraints to solve this task.
arXiv Detail & Related papers (2020-11-30T20:46:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.