Real-time Holistic Robot Pose Estimation with Unknown States
- URL: http://arxiv.org/abs/2402.05655v4
- Date: Tue, 16 Jul 2024 04:57:48 GMT
- Title: Real-time Holistic Robot Pose Estimation with Unknown States
- Authors: Shikun Ban, Juling Fan, Xiaoxuan Ma, Wentao Zhu, Yu Qiao, Yizhou Wang,
- Abstract summary: Estimating robot pose from RGB images is a crucial problem in computer vision and robotics.
Previous methods presume full knowledge of robot internal states, e.g. ground-truth robot joint angles.
This work introduces an efficient framework for real-time robot pose estimation from RGB images without requiring known robot states.
- Score: 30.41806081818826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating robot pose from RGB images is a crucial problem in computer vision and robotics. While previous methods have achieved promising performance, most of them presume full knowledge of robot internal states, e.g. ground-truth robot joint angles. However, this assumption is not always valid in practical situations. In real-world applications such as multi-robot collaboration or human-robot interaction, the robot joint states might not be shared or could be unreliable. On the other hand, existing approaches that estimate robot pose without joint state priors suffer from heavy computation burdens and thus cannot support real-time applications. This work introduces an efficient framework for real-time robot pose estimation from RGB images without requiring known robot states. Our method estimates camera-to-robot rotation, robot state parameters, keypoint locations, and root depth, employing a neural network module for each task to facilitate learning and sim-to-real transfer. Notably, it achieves inference in a single feed-forward pass without iterative optimization. Our approach offers a 12-time speed increase with state-of-the-art accuracy, enabling real-time holistic robot pose estimation for the first time. Code and models are available at https://github.com/Oliverbansk/Holistic-Robot-Pose-Estimation.
Related papers
- RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training [27.63332596592781]
Vision-based pose estimation of articulated robots with unknown joint angles has applications in collaborative robotics and human-robot interaction tasks.
Current frameworks use neural network encoders to extract image features and downstream layers to predict joint angles and robot pose.
We introduce RoboPEPP, a method that fuses information about the robot's physical model into the encoder using a masking-based self-supervised embedding-predictive architecture.
arXiv Detail & Related papers (2024-11-26T18:26:17Z) - CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera [18.971816395021488]
Markerless pose estimation methods have eliminated the need for time-consuming physical setups for camera-to-robot calibration.
We propose a novel framework capable of estimating the robot pose with partially visible robot manipulators.
arXiv Detail & Related papers (2024-09-16T16:22:43Z) - RoboScript: Code Generation for Free-Form Manipulation Tasks across Real
and Simulation [77.41969287400977]
This paper presents textbfRobotScript, a platform for a deployable robot manipulation pipeline powered by code generation.
We also present a benchmark for a code generation benchmark for robot manipulation tasks in free-form natural language.
We demonstrate the adaptability of our code generation framework across multiple robot embodiments, including the Franka and UR5 robot arms.
arXiv Detail & Related papers (2024-02-22T15:12:00Z) - What Matters to You? Towards Visual Representation Alignment for Robot
Learning [81.30964736676103]
When operating in service of people, robots need to optimize rewards aligned with end-user preferences.
We propose Representation-Aligned Preference-based Learning (RAPL), a method for solving the visual representation alignment problem.
arXiv Detail & Related papers (2023-10-11T23:04:07Z) - ImitationNet: Unsupervised Human-to-Robot Motion Retargeting via Shared Latent Space [9.806227900768926]
This paper introduces a novel deep-learning approach for human-to-robot motion.
Our method does not require paired human-to-robot data, which facilitates its translation to new robots.
Our model outperforms existing works regarding human-to-robot similarity in terms of efficiency and precision.
arXiv Detail & Related papers (2023-09-11T08:55:04Z) - Giving Robots a Hand: Learning Generalizable Manipulation with
Eye-in-Hand Human Video Demonstrations [66.47064743686953]
Eye-in-hand cameras have shown promise in enabling greater sample efficiency and generalization in vision-based robotic manipulation.
Videos of humans performing tasks, on the other hand, are much cheaper to collect since they eliminate the need for expertise in robotic teleoperation.
In this work, we augment narrow robotic imitation datasets with broad unlabeled human video demonstrations to greatly enhance the generalization of eye-in-hand visuomotor policies.
arXiv Detail & Related papers (2023-07-12T07:04:53Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - Image-based Pose Estimation and Shape Reconstruction for Robot
Manipulators and Soft, Continuum Robots via Differentiable Rendering [20.62295718847247]
State estimation from measured data is crucial for robotic applications as autonomous systems rely on sensors to capture the motion and localize in the 3D world.
In this work, we achieve image-based robot pose estimation and shape reconstruction from camera images.
We demonstrate that our method of using geometrical shape primitives can achieve high accuracy in shape reconstruction for a soft continuum robot and pose estimation for a robot manipulator.
arXiv Detail & Related papers (2023-02-27T18:51:29Z) - Single-view robot pose and joint angle estimation via render & compare [40.05546237998603]
We introduce RoboPose, a method to estimate the joint angles and the 6D camera-to-robot pose of a known articulated robot from a single RGB image.
This is an important problem to grant mobile and itinerant autonomous systems the ability to interact with other robots.
arXiv Detail & Related papers (2021-04-19T14:48:29Z) - Where is my hand? Deep hand segmentation for visual self-recognition in
humanoid robots [129.46920552019247]
We propose the use of a Convolution Neural Network (CNN) to segment the robot hand from an image in an egocentric view.
We fine-tuned the Mask-RCNN network for the specific task of segmenting the hand of the humanoid robot Vizzy.
arXiv Detail & Related papers (2021-02-09T10:34:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.