Related papers: RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training

RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training

URL: http://arxiv.org/abs/2411.17662v1
Date: Tue, 26 Nov 2024 18:26:17 GMT
Title: RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training
Authors: Raktim Gautam Goswami, Prashanth Krishnamurthy, Yann LeCun, Farshad Khorrami,
Abstract summary: Vision-based pose estimation of articulated robots with unknown joint angles has applications in collaborative robotics and human-robot interaction tasks. Current frameworks use neural network encoders to extract image features and downstream layers to predict joint angles and robot pose. We introduce RoboPEPP, a method that fuses information about the robot's physical model into the encoder using a masking-based self-supervised embedding-predictive architecture.
Score: 27.63332596592781
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-based pose estimation of articulated robots with unknown joint angles has applications in collaborative robotics and human-robot interaction tasks. Current frameworks use neural network encoders to extract image features and downstream layers to predict joint angles and robot pose. While images of robots inherently contain rich information about the robot's physical structures, existing methods often fail to leverage it fully; therefore, limiting performance under occlusions and truncations. To address this, we introduce RoboPEPP, a method that fuses information about the robot's physical model into the encoder using a masking-based self-supervised embedding-predictive architecture. Specifically, we mask the robot's joints and pre-train an encoder-predictor model to infer the joints' embeddings from surrounding unmasked regions, enhancing the encoder's understanding of the robot's physical model. The pre-trained encoder-predictor pair, along with joint angle and keypoint prediction networks, is then fine-tuned for pose and joint angle estimation. Random masking of input during fine-tuning and keypoint filtering during evaluation further improves robustness. Our method, evaluated on several datasets, achieves the best results in robot pose and joint angle estimation while being the least sensitive to occlusions and requiring the lowest execution time.

Related papers

VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation [53.63540587160549]
VidBot is a framework enabling zero-shot robotic manipulation using learned 3D affordance from in-the-wild monocular RGB-only human videos. VidBot paves the way for leveraging everyday human videos to make robot learning more scalable.
arXiv Detail & Related papers (2025-03-10T10:04:58Z)
CtRNet-X: Camera-to-Robot Pose Estimation in Real-world Conditions Using a Single Camera [18.971816395021488]
Markerless pose estimation methods have eliminated the need for time-consuming physical setups for camera-to-robot calibration. We propose a novel framework capable of estimating the robot pose with partially visible robot manipulators.
arXiv Detail & Related papers (2024-09-16T16:22:43Z)
Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models [53.22792173053473]
We introduce an interactive robotic manipulation framework called Polaris. Polaris integrates perception and interaction by utilizing GPT-4 alongside grounded vision models. We propose a novel Synthetic-to-Real (Syn2Real) pose estimation pipeline.
arXiv Detail & Related papers (2024-08-15T06:40:38Z)
Real-time Holistic Robot Pose Estimation with Unknown States [30.41806081818826]
Estimating robot pose from RGB images is a crucial problem in computer vision and robotics. Previous methods presume full knowledge of robot internal states, e.g. ground-truth robot joint angles. This work introduces an efficient framework for real-time robot pose estimation from RGB images without requiring known robot states.
arXiv Detail & Related papers (2024-02-08T13:12:50Z)
Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics. Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens. We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z)
Markerless Camera-to-Robot Pose Estimation via Self-supervised Sim-to-Real Transfer [26.21320177775571]
We propose an end-to-end pose estimation framework that is capable of online camera-to-robot calibration and a self-supervised training method. Our framework combines deep learning and geometric vision for solving the robot pose, and the pipeline is fully differentiable.
arXiv Detail & Related papers (2023-02-28T05:55:42Z)
Neural Scene Representation for Locomotion on Structured Terrain [56.48607865960868]
We propose a learning-based method to reconstruct the local terrain for a mobile robot traversing urban environments. Using a stream of depth measurements from the onboard cameras and the robot's trajectory, the estimates the topography in the robot's vicinity. We propose a 3D reconstruction model that faithfully reconstructs the scene, despite the noisy measurements and large amounts of missing data coming from the blind spots of the camera arrangement.
arXiv Detail & Related papers (2022-06-16T10:45:17Z)
Single-view robot pose and joint angle estimation via render & compare [40.05546237998603]
We introduce RoboPose, a method to estimate the joint angles and the 6D camera-to-robot pose of a known articulated robot from a single RGB image. This is an important problem to grant mobile and itinerant autonomous systems the ability to interact with other robots.
arXiv Detail & Related papers (2021-04-19T14:48:29Z)
Where is my hand? Deep hand segmentation for visual self-recognition in humanoid robots [129.46920552019247]
We propose the use of a Convolution Neural Network (CNN) to segment the robot hand from an image in an egocentric view. We fine-tuned the Mask-RCNN network for the specific task of segmenting the hand of the humanoid robot Vizzy.
arXiv Detail & Related papers (2021-02-09T10:34:32Z)
Online Body Schema Adaptation through Cost-Sensitive Active Learning [63.84207660737483]
The work was implemented in a simulation environment, using the 7DoF arm of the iCub robot simulator. A cost-sensitive active learning approach is used to select optimal joint configurations. The results show cost-sensitive active learning has similar accuracy to the standard active learning approach, while reducing in about half the executed movement.
arXiv Detail & Related papers (2021-01-26T16:01:02Z)
Nothing But Geometric Constraints: A Model-Free Method for Articulated Object Pose Estimation [89.82169646672872]
We propose an unsupervised vision-based system to estimate the joint configurations of the robot arm from a sequence of RGB or RGB-D images without knowing the model a priori. We combine a classical geometric formulation with deep learning and extend the use of epipolar multi-rigid-body constraints to solve this task.
arXiv Detail & Related papers (2020-11-30T20:46:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.