CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera
- URL: http://arxiv.org/abs/2409.10441v1
- Date: Mon, 16 Sep 2024 16:22:43 GMT
- Title: CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera
- Authors: Jingpei Lu, Zekai Liang, Tristin Xie, Florian Ritcher, Shan Lin, Sainan Liu, Michael C. Yip,
- Abstract summary: Markerless pose estimation methods have eliminated the need for time-consuming physical setups for camera-to-robot calibration.
We propose a novel framework capable of estimating the robot pose with partially visible robot manipulators.
- Score: 18.971816395021488
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Camera-to-robot calibration is crucial for vision-based robot control and requires effort to make it accurate. Recent advancements in markerless pose estimation methods have eliminated the need for time-consuming physical setups for camera-to-robot calibration. While the existing markerless pose estimation methods have demonstrated impressive accuracy without the need for cumbersome setups, they rely on the assumption that all the robot joints are visible within the camera's field of view. However, in practice, robots usually move in and out of view, and some portion of the robot may stay out-of-frame during the whole manipulation task due to real-world constraints, leading to a lack of sufficient visual features and subsequent failure of these approaches. To address this challenge and enhance the applicability to vision-based robot control, we propose a novel framework capable of estimating the robot pose with partially visible robot manipulators. Our approach leverages the Vision-Language Models for fine-grained robot components detection, and integrates it into a keypoint-based pose estimation network, which enables more robust performance in varied operational conditions. The framework is evaluated on both public robot datasets and self-collected partial-view datasets to demonstrate our robustness and generalizability. As a result, this method is effective for robot pose estimation in a wider range of real-world manipulation scenarios.
Related papers
- RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training [27.63332596592781]
Vision-based pose estimation of articulated robots with unknown joint angles has applications in collaborative robotics and human-robot interaction tasks.
Current frameworks use neural network encoders to extract image features and downstream layers to predict joint angles and robot pose.
We introduce RoboPEPP, a method that fuses information about the robot's physical model into the encoder using a masking-based self-supervised embedding-predictive architecture.
arXiv Detail & Related papers (2024-11-26T18:26:17Z) - Unifying 3D Representation and Control of Diverse Robots with a Single Camera [48.279199537720714]
We introduce Neural Jacobian Fields, an architecture that autonomously learns to model and control robots from vision alone.
Our approach achieves accurate closed-loop control and recovers the causal dynamic structure of each robot.
arXiv Detail & Related papers (2024-07-11T17:55:49Z) - Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation [65.46610405509338]
We seek to learn a generalizable goal-conditioned policy that enables zero-shot robot manipulation.
Our framework,Track2Act predicts tracks of how points in an image should move in future time-steps based on a goal.
We show that this approach of combining scalably learned track prediction with a residual policy enables diverse generalizable robot manipulation.
arXiv Detail & Related papers (2024-05-02T17:56:55Z) - Real-time Holistic Robot Pose Estimation with Unknown States [30.41806081818826]
Estimating robot pose from RGB images is a crucial problem in computer vision and robotics.
Previous methods presume full knowledge of robot internal states, e.g. ground-truth robot joint angles.
This work introduces an efficient framework for real-time robot pose estimation from RGB images without requiring known robot states.
arXiv Detail & Related papers (2024-02-08T13:12:50Z) - Giving Robots a Hand: Learning Generalizable Manipulation with
Eye-in-Hand Human Video Demonstrations [66.47064743686953]
Eye-in-hand cameras have shown promise in enabling greater sample efficiency and generalization in vision-based robotic manipulation.
Videos of humans performing tasks, on the other hand, are much cheaper to collect since they eliminate the need for expertise in robotic teleoperation.
In this work, we augment narrow robotic imitation datasets with broad unlabeled human video demonstrations to greatly enhance the generalization of eye-in-hand visuomotor policies.
arXiv Detail & Related papers (2023-07-12T07:04:53Z) - Markerless Camera-to-Robot Pose Estimation via Self-supervised
Sim-to-Real Transfer [26.21320177775571]
We propose an end-to-end pose estimation framework that is capable of online camera-to-robot calibration and a self-supervised training method.
Our framework combines deep learning and geometric vision for solving the robot pose, and the pipeline is fully differentiable.
arXiv Detail & Related papers (2023-02-28T05:55:42Z) - Image-based Pose Estimation and Shape Reconstruction for Robot
Manipulators and Soft, Continuum Robots via Differentiable Rendering [20.62295718847247]
State estimation from measured data is crucial for robotic applications as autonomous systems rely on sensors to capture the motion and localize in the 3D world.
In this work, we achieve image-based robot pose estimation and shape reconstruction from camera images.
We demonstrate that our method of using geometrical shape primitives can achieve high accuracy in shape reconstruction for a soft continuum robot and pose estimation for a robot manipulator.
arXiv Detail & Related papers (2023-02-27T18:51:29Z) - Neural Scene Representation for Locomotion on Structured Terrain [56.48607865960868]
We propose a learning-based method to reconstruct the local terrain for a mobile robot traversing urban environments.
Using a stream of depth measurements from the onboard cameras and the robot's trajectory, the estimates the topography in the robot's vicinity.
We propose a 3D reconstruction model that faithfully reconstructs the scene, despite the noisy measurements and large amounts of missing data coming from the blind spots of the camera arrangement.
arXiv Detail & Related papers (2022-06-16T10:45:17Z) - Can Foundation Models Perform Zero-Shot Task Specification For Robot
Manipulation? [54.442692221567796]
Task specification is critical for engagement of non-expert end-users and adoption of personalized robots.
A widely studied approach to task specification is through goals, using either compact state vectors or goal images from the same robot scene.
In this work, we explore alternate and more general forms of goal specification that are expected to be easier for humans to specify and use.
arXiv Detail & Related papers (2022-04-23T19:39:49Z) - Single-view robot pose and joint angle estimation via render & compare [40.05546237998603]
We introduce RoboPose, a method to estimate the joint angles and the 6D camera-to-robot pose of a known articulated robot from a single RGB image.
This is an important problem to grant mobile and itinerant autonomous systems the ability to interact with other robots.
arXiv Detail & Related papers (2021-04-19T14:48:29Z) - Morphology-Agnostic Visual Robotic Control [76.44045983428701]
MAVRIC is an approach that works with minimal prior knowledge of the robot's morphology.
We demonstrate our method on visually-guided 3D point reaching, trajectory following, and robot-to-robot imitation.
arXiv Detail & Related papers (2019-12-31T15:45:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.