Related papers: Efficient Surgical Robotic Instrument Pose Reconstruction in Real World Conditions Using Unified Feature Detection

Efficient Surgical Robotic Instrument Pose Reconstruction in Real World Conditions Using Unified Feature Detection

URL: http://arxiv.org/abs/2510.03532v1
Date: Fri, 03 Oct 2025 22:03:28 GMT
Title: Efficient Surgical Robotic Instrument Pose Reconstruction in Real World Conditions Using Unified Feature Detection
Authors: Zekai Liang, Kazuya Miyata, Xiao Liang, Florian Richter, Michael C. Yip,
Abstract summary: MIS robots have long kinematic chains and partial visibility of their degrees of freedom in the camera.<n>We propose a novel framework that unifies the detection of geometric primitives through a shared encoding.<n>This architecture detects both keypoints and edges in a single inference and is trained on large-scale synthetic data with projective labeling.
Score: 21.460727996614704
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Accurate camera-to-robot calibration is essential for any vision-based robotic control system and especially critical in minimally invasive surgical robots, where instruments conduct precise micro-manipulations. However, MIS robots have long kinematic chains and partial visibility of their degrees of freedom in the camera, which introduces challenges for conventional camera-to-robot calibration methods that assume stiff robots with good visibility. Previous works have investigated both keypoint-based and rendering-based approaches to address this challenge in real-world conditions; however, they often struggle with consistent feature detection or have long inference times, neither of which are ideal for online robot control. In this work, we propose a novel framework that unifies the detection of geometric primitives (keypoints and shaft edges) through a shared encoding, enabling efficient pose estimation via projection geometry. This architecture detects both keypoints and edges in a single inference and is trained on large-scale synthetic data with projective labeling. This method is evaluated across both feature detection and pose estimation, with qualitative and quantitative results demonstrating fast performance and state-of-the-art accuracy in challenging surgical environments.

Related papers

Sight Over Site: Perception-Aware Reinforcement Learning for Efficient Robotic Inspection [57.37596278863949]
In this work, we revisit inspection from a perception-aware perspective.<n>We propose an end-to-end reinforcement learning framework that explicitly incorporates target visibility as the primary objective.<n>We show that our method outperforms existing classical and learning-based navigation approaches.
arXiv Detail & Related papers (2025-09-22T15:14:02Z)
Is Single-View Mesh Reconstruction Ready for Robotics? [78.14584238127338]
We evaluate single-view mesh reconstruction models for their potential in enabling instant digital twin creation for real-time planning and dynamics prediction using physics simulators for robotic manipulation.<n>Our findings highlight critical gaps between computer vision advances and robotics needs, guiding future research at this intersection.
arXiv Detail & Related papers (2025-05-23T14:35:56Z)
Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation [34.47272224723296]
We present Taccel, a high-performance simulation platform that integrates IPC and ABD to model robots, tactile sensors, and objects with both accuracy and unprecedented speed.<n>Unlike previous simulators that operate at sub-real-time speeds with limited parallelization, Taccel provides precise physics simulation and realistic tactile signals.<n>These capabilities position Taccel as a powerful tool for scaling up tactile robotics research and development, potentially transforming how robots interact with and understand their physical environment.
arXiv Detail & Related papers (2025-04-17T12:57:11Z)
ARC-Calib: Autonomous Markerless Camera-to-Robot Calibration via Exploratory Robot Motions [15.004750210002152]
ARC-Calib is a model-based markerless camera-to-robot calibration framework.<n>It is fully autonomous and generalizable across diverse robots.
arXiv Detail & Related papers (2025-03-18T20:03:32Z)
RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training [27.63332596592781]
Vision-based pose estimation of articulated robots with unknown joint angles has applications in collaborative robotics and human-robot interaction tasks.<n>Current frameworks use neural network encoders to extract image features and downstream layers to predict joint angles and robot pose.<n>We introduce RoboPEPP, a method that fuses information about the robot's physical model into the encoder using a masking-based self-supervised embedding-predictive architecture.
arXiv Detail & Related papers (2024-11-26T18:26:17Z)
CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera [18.971816395021488]
Markerless pose estimation methods have eliminated the need for time-consuming physical setups for camera-to-robot calibration. We propose a novel framework capable of estimating the robot pose with partially visible robot manipulators.
arXiv Detail & Related papers (2024-09-16T16:22:43Z)
Kalib: Easy Hand-Eye Calibration with Reference Point Tracking [52.4190876409222]
Kalib is an automatic hand-eye calibration method that leverages the generalizability of visual foundation models to overcome challenges.<n>During calibration, a kinematic reference point is tracked in the camera coordinate 3D coordinates in the space behind the robot.<n>Kalib's user-friendly design and minimal setup requirements make it a possible solution for continuous operation in unstructured environments.
arXiv Detail & Related papers (2024-08-20T06:03:40Z)
GISR: Geometric Initialization and Silhouette-based Refinement for Single-View Robot Pose and Configuration Estimation [0.0]
GISR is a robot-to-camera pose estimation method that prioritizes execution in real-time. We evaluate GISR on publicly available data and show that it outperforms existing methods of the same class in terms of both speed and accuracy.
arXiv Detail & Related papers (2024-05-08T08:39:25Z)
Robust Surgical Tool Tracking with Pixel-based Probabilities for Projected Geometric Primitives [28.857732667640068]
Controlling robotic manipulators via visual feedback requires a known coordinate frame transformation between the robot and the camera. Uncertainties in mechanical systems as well as camera calibration create errors in this coordinate frame transformation. We estimate the camera-to-base transform and joint angle measurement errors for surgical robotic tools using an image based insertion-shaft detection algorithm and probabilistic models.
arXiv Detail & Related papers (2024-03-08T00:57:03Z)
EasyHeC: Accurate and Automatic Hand-eye Calibration via Differentiable Rendering and Space Exploration [49.90228618894857]
We introduce a new approach to hand-eye calibration called EasyHeC, which is markerless, white-box, and delivers superior accuracy and robustness. We propose to use two key technologies: differentiable rendering-based camera pose optimization and consistency-based joint space exploration. Our evaluation demonstrates superior performance in synthetic and real-world datasets.
arXiv Detail & Related papers (2023-05-02T03:49:54Z)
Online Body Schema Adaptation through Cost-Sensitive Active Learning [63.84207660737483]
The work was implemented in a simulation environment, using the 7DoF arm of the iCub robot simulator. A cost-sensitive active learning approach is used to select optimal joint configurations. The results show cost-sensitive active learning has similar accuracy to the standard active learning approach, while reducing in about half the executed movement.
arXiv Detail & Related papers (2021-01-26T16:01:02Z)
Nothing But Geometric Constraints: A Model-Free Method for Articulated Object Pose Estimation [89.82169646672872]
We propose an unsupervised vision-based system to estimate the joint configurations of the robot arm from a sequence of RGB or RGB-D images without knowing the model a priori. We combine a classical geometric formulation with deep learning and extend the use of epipolar multi-rigid-body constraints to solve this task.
arXiv Detail & Related papers (2020-11-30T20:46:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.