Real-time Pose Estimation from Images for Multiple Humanoid Robots
- URL: http://arxiv.org/abs/2107.02675v1
- Date: Tue, 6 Jul 2021 15:33:57 GMT
- Title: Real-time Pose Estimation from Images for Multiple Humanoid Robots
- Authors: Arash Amini, Hafez Farazi, Sven Behnke
- Abstract summary: We present a lightweight pose estimation model that can work in real-time on humanoid robots in the RoboCup Humanoid League environment.
The results of this work have the potential to enable many advanced behaviors for soccer-playing robots.
- Score: 45.182157261640675
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pose estimation commonly refers to computer vision methods that recognize
people's body postures in images or videos. With recent advancements in deep
learning, we now have compelling models to tackle the problem in real-time.
Since these models are usually designed for human images, one needs to adapt
existing models to work on other creatures, including robots. This paper
examines different state-of-the-art pose estimation models and proposes a
lightweight model that can work in real-time on humanoid robots in the RoboCup
Humanoid League environment. Additionally, we present a novel dataset called
the HumanoidRobotPose dataset. The results of this work have the potential to
enable many advanced behaviors for soccer-playing robots.
Related papers
- HumanRAM: Feed-forward Human Reconstruction and Animation Model using Transformers [60.86393841247567]
HumanRAM is a novel feed-forward approach for generalizable human reconstruction and animation from monocular or sparse human images.<n>Our approach integrates human reconstruction and animation into a unified framework by introducing explicit pose conditions.<n> Experiments show that HumanRAM significantly surpasses previous methods in terms of reconstruction accuracy, animation fidelity, and generalization performance on real-world datasets.
arXiv Detail & Related papers (2025-06-03T17:50:05Z) - Differentiable Robot Rendering [45.23538293501457]
We introduce differentiable robot rendering, a method allowing the visual appearance of a robot body to be directly differentiable with respect to its control parameters.
We demonstrate its capability and usage in applications including reconstruction of robot poses from images and controlling robots through vision language models.
arXiv Detail & Related papers (2024-10-17T17:59:02Z) - HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation [50.616995671367704]
We present a high-dimensional, simulated robot learning benchmark, HumanoidBench, featuring a humanoid robot equipped with dexterous hands.
Our findings reveal that state-of-the-art reinforcement learning algorithms struggle with most tasks, whereas a hierarchical learning approach achieves superior performance when supported by robust low-level policies.
arXiv Detail & Related papers (2024-03-15T17:45:44Z) - Robot at the Mirror: Learning to Imitate via Associating Self-supervised
Models [0.0]
We introduce an approach to building a custom model from ready-made self-supervised models via their associating instead of training and fine-tuning.
We demonstrate it with an example of a humanoid robot looking at the mirror and learning to detect the 3D pose of its own body from the image it perceives.
arXiv Detail & Related papers (2023-11-22T08:30:20Z) - High-Degrees-of-Freedom Dynamic Neural Fields for Robot Self-Modeling and Motion Planning [6.229216953398305]
A robot self-model is a representation of the robot's physical morphology that can be used for motion planning tasks.
We propose a new encoder-based neural density field architecture for dynamic object-centric scenes conditioned on high numbers of degrees of freedom.
In a 7-DOF robot test setup, the learned self-model achieves a Chamfer-L2 distance of 2% of the robot's dimension workspace.
arXiv Detail & Related papers (2023-10-05T16:01:29Z) - Giving Robots a Hand: Learning Generalizable Manipulation with
Eye-in-Hand Human Video Demonstrations [66.47064743686953]
Eye-in-hand cameras have shown promise in enabling greater sample efficiency and generalization in vision-based robotic manipulation.
Videos of humans performing tasks, on the other hand, are much cheaper to collect since they eliminate the need for expertise in robotic teleoperation.
In this work, we augment narrow robotic imitation datasets with broad unlabeled human video demonstrations to greatly enhance the generalization of eye-in-hand visuomotor policies.
arXiv Detail & Related papers (2023-07-12T07:04:53Z) - Affordances from Human Videos as a Versatile Representation for Robotics [31.248842798600606]
We train a visual affordance model that estimates where and how in the scene a human is likely to interact.
The structure of these behavioral affordances directly enables the robot to perform many complex tasks.
We show the efficacy of our approach, which we call VRB, across 4 real world environments, over 10 different tasks, and 2 robotic platforms operating in the wild.
arXiv Detail & Related papers (2023-04-17T17:59:34Z) - RT-1: Robotics Transformer for Real-World Control at Scale [98.09428483862165]
We present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties.
We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.
arXiv Detail & Related papers (2022-12-13T18:55:15Z) - Neural Novel Actor: Learning a Generalized Animatable Neural
Representation for Human Actors [98.24047528960406]
We propose a new method for learning a generalized animatable neural representation from a sparse set of multi-view imagery of multiple persons.
The learned representation can be used to synthesize novel view images of an arbitrary person from a sparse set of cameras, and further animate them with the user's pose control.
arXiv Detail & Related papers (2022-08-25T07:36:46Z) - Full-Body Visual Self-Modeling of Robot Morphologies [29.76701883250049]
Internal computational models of physical bodies are fundamental to the ability of robots and animals alike to plan and control their actions.
Recent progress in fully data-driven self-modeling has enabled machines to learn their own forward kinematics directly from task-agnostic interaction data.
Here, we propose that instead of directly modeling forward-kinematics, a more useful form of self-modeling is one that could answer space occupancy queries.
arXiv Detail & Related papers (2021-11-11T18:58:07Z) - S3: Neural Shape, Skeleton, and Skinning Fields for 3D Human Modeling [103.65625425020129]
We represent the pedestrian's shape, pose and skinning weights as neural implicit functions that are directly learned from data.
We demonstrate the effectiveness of our approach on various datasets and show that our reconstructions outperform existing state-of-the-art methods.
arXiv Detail & Related papers (2021-01-17T02:16:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.