Fiducial Exoskeletons: Image-Centric Robot State Estimation
- URL: http://arxiv.org/abs/2601.08034v1
- Date: Mon, 12 Jan 2026 22:04:25 GMT
- Title: Fiducial Exoskeletons: Image-Centric Robot State Estimation
- Authors: Cameron Smith, Basile Van Hoorick, Vitor Guizilini, Yue Wang,
- Abstract summary: We introduce Fiducial Exoskeletons, an image-based reformulation of 3D robot state estimation.<n>Our key insight is twofold. First, we cast robot state estimation as 6D pose estimation of each link from a single RGB image.<n>Second, we make per-link 6D pose estimation robust and simple - even without learning.
- Score: 21.491677821308688
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce Fiducial Exoskeletons, an image-based reformulation of 3D robot state estimation that replaces cumbersome procedures and motor-centric pipelines with single-image inference. Traditional approaches - especially robot-camera extrinsic estimation - often rely on high-precision actuators and require time-consuming routines such as hand-eye calibration. In contrast, modern learning-based robot control is increasingly trained and deployed from RGB observations on lower-cost hardware. Our key insight is twofold. First, we cast robot state estimation as 6D pose estimation of each link from a single RGB image: the robot-camera base transform is obtained directly as the estimated base-link pose, and the joint state is recovered via a lightweight global optimization that enforces kinematic consistency with the observed link poses (optionally warm-started with encoder readings). Second, we make per-link 6D pose estimation robust and simple - even without learning - by introducing the fiducial exoskeleton: a lightweight 3D-printed mount with a fiducial marker on each link and known marker-link geometry. This design yields robust camera-robot extrinsics, per-link SE(3) poses, and joint-angle state from a single image, enabling robust state estimation even on unplugged robots. Demonstrated on a low-cost robot arm, fiducial exoskeletons substantially simplify setup while improving calibration, state accuracy, and downstream 3D control performance. We release code and printable hardware designs to enable further algorithm-hardware co-design.
Related papers
- RoboTAG: End-to-end Robot Configuration Estimation via Topological Alignment Graph [62.270763554624615]
Estimating robot pose from a monocular RGB image is a challenge in robotics and computer vision.<n>Existing methods typically build networks on top of 2D visual backbones and depend heavily on labeled data for training.<n>We propose Robot Topological Alignment Graph (RoboTAG), which incorporates a 3D branch to inject 3D priors while enabling co-evolution of the 2D and 3D representations.
arXiv Detail & Related papers (2025-11-11T00:49:15Z) - RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training [27.63332596592781]
Vision-based pose estimation of articulated robots with unknown joint angles has applications in collaborative robotics and human-robot interaction tasks.<n>Current frameworks use neural network encoders to extract image features and downstream layers to predict joint angles and robot pose.<n>We introduce RoboPEPP, a method that fuses information about the robot's physical model into the encoder using a masking-based self-supervised embedding-predictive architecture.
arXiv Detail & Related papers (2024-11-26T18:26:17Z) - Kalib: Easy Hand-Eye Calibration with Reference Point Tracking [52.4190876409222]
Kalib is an automatic hand-eye calibration method that leverages the generalizability of visual foundation models to overcome challenges.<n>During calibration, a kinematic reference point is tracked in the camera coordinate 3D coordinates in the space behind the robot.<n>Kalib's user-friendly design and minimal setup requirements make it a possible solution for continuous operation in unstructured environments.
arXiv Detail & Related papers (2024-08-20T06:03:40Z) - Unifying Scene Representation and Hand-Eye Calibration with 3D Foundation Models [13.58353565350936]
Representing the environment is a central challenge in robotics.
Traditionally, users need to calibrate the camera using a specific external marker, such as a checkerboard or AprilTag.
This paper advocates for the integration of 3D foundation representation into robotic systems equipped with manipulator-mounted RGB cameras.
arXiv Detail & Related papers (2024-04-17T18:29:32Z) - EasyHeC: Accurate and Automatic Hand-eye Calibration via Differentiable
Rendering and Space Exploration [49.90228618894857]
We introduce a new approach to hand-eye calibration called EasyHeC, which is markerless, white-box, and delivers superior accuracy and robustness.
We propose to use two key technologies: differentiable rendering-based camera pose optimization and consistency-based joint space exploration.
Our evaluation demonstrates superior performance in synthetic and real-world datasets.
arXiv Detail & Related papers (2023-05-02T03:49:54Z) - External Camera-based Mobile Robot Pose Estimation for Collaborative
Perception with Smart Edge Sensors [22.5939915003931]
We present an approach for estimating a mobile robot's pose w.r.t. the allocentric coordinates of a network of static cameras using multi-view RGB images.
The images are processed online, locally on smart edge sensors by deep neural networks to detect the robot.
With the robot's pose precisely estimated, its observations can be fused into the allocentric scene model.
arXiv Detail & Related papers (2023-03-07T11:03:33Z) - Markerless Camera-to-Robot Pose Estimation via Self-supervised
Sim-to-Real Transfer [26.21320177775571]
We propose an end-to-end pose estimation framework that is capable of online camera-to-robot calibration and a self-supervised training method.
Our framework combines deep learning and geometric vision for solving the robot pose, and the pipeline is fully differentiable.
arXiv Detail & Related papers (2023-02-28T05:55:42Z) - Robot Self-Calibration Using Actuated 3D Sensors [0.0]
This paper treats robot calibration as an offline SLAM problem, where scanning poses are linked to a fixed point in space by a moving kinematic chain.
As such, the presented framework allows robot calibration using nothing but an arbitrary eye-in-hand depth sensor.
A detailed evaluation of the system is shown on a real robot with various attached 3D sensors.
arXiv Detail & Related papers (2022-06-07T16:35:08Z) - Simple and Effective Synthesis of Indoor 3D Scenes [78.95697556834536]
We study the problem of immersive 3D indoor scenes from one or more images.
Our aim is to generate high-resolution images and videos from novel viewpoints.
We propose an image-to-image GAN that maps directly from reprojections of incomplete point clouds to full high-resolution RGB-D images.
arXiv Detail & Related papers (2022-04-06T17:54:46Z) - Monocular 3D Reconstruction of Interacting Hands via Collision-Aware
Factorized Refinements [96.40125818594952]
We make the first attempt to reconstruct 3D interacting hands from monocular single RGB images.
Our method can generate 3D hand meshes with both precise 3D poses and minimal collisions.
arXiv Detail & Related papers (2021-11-01T08:24:10Z) - Nothing But Geometric Constraints: A Model-Free Method for Articulated
Object Pose Estimation [89.82169646672872]
We propose an unsupervised vision-based system to estimate the joint configurations of the robot arm from a sequence of RGB or RGB-D images without knowing the model a priori.
We combine a classical geometric formulation with deep learning and extend the use of epipolar multi-rigid-body constraints to solve this task.
arXiv Detail & Related papers (2020-11-30T20:46:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.