Related papers: Robot at the Mirror: Learning to Imitate via Associating Self-supervised Models

Robot at the Mirror: Learning to Imitate via Associating Self-supervised Models

URL: http://arxiv.org/abs/2311.13226v2
Date: Mon, 26 Feb 2024 14:02:09 GMT
Title: Robot at the Mirror: Learning to Imitate via Associating Self-supervised Models
Authors: Andrej Lucny, Kristina Malinovska, and Igor Farkas
Abstract summary: We introduce an approach to building a custom model from ready-made self-supervised models via their associating instead of training and fine-tuning. We demonstrate it with an example of a humanoid robot looking at the mirror and learning to detect the 3D pose of its own body from the image it perceives.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce an approach to building a custom model from ready-made self-supervised models via their associating instead of training and fine-tuning. We demonstrate it with an example of a humanoid robot looking at the mirror and learning to detect the 3D pose of its own body from the image it perceives. To build our model, we first obtain features from the visual input and the postures of the robot's body via models prepared before the robot's operation. Then, we map their corresponding latent spaces by a sample-efficient robot's self-exploration at the mirror. In this way, the robot builds the solicited 3D pose detector, which quality is immediately perfect on the acquired samples instead of obtaining the quality gradually. The mapping, which employs associating the pairs of feature vectors, is then implemented in the same way as the key-value mechanism of the famous transformer models. Finally, deploying our model for imitation to a simulated robot allows us to study, tune up, and systematically evaluate its hyperparameters without the involvement of the human counterpart, advancing our previous research.

Related papers

Is Single-View Mesh Reconstruction Ready for Robotics? [63.29645501232935]
This paper evaluates single-view mesh reconstruction models for creating digital twin environments in robot manipulation.<n>We establish benchmarking criteria for 3D reconstruction in robotics contexts.<n>Despite success on computer vision benchmarks, existing approaches fail to meet robotics-specific requirements.
arXiv Detail & Related papers (2025-05-23T14:35:56Z)
Autonomous Human-Robot Interaction via Operator Imitation [3.650193138379926]
We propose to create autonomous interactive robots, by training a model to imitate operator data. Our model is trained on a dataset of human-robot interactions. We show that our method enables simple autonomous human-robot interactions comparable to the expert-operator baseline.
arXiv Detail & Related papers (2025-04-03T16:06:44Z)
Self-Modeling Robots by Photographing [4.482658473425829]
We propose a high-quality, texture-aware, and link-level method for robot self-modeling. We use 3D Gaussians to represent the static morphology and texture of robots, and cluster the 3D Gaussians to construct neural ellipsoid bones. By feeding the kinematic neural network with joint angles, we can utilize the well-trained model to describe the corresponding morphology, kinematics and texture of robots at the link level.
arXiv Detail & Related papers (2025-03-07T13:21:18Z)
DIRIGENt: End-To-End Robotic Imitation of Human Demonstrations Based on a Diffusion Model [16.26334759935617]
We introduce DIRIGENt, a novel end-to-end diffusion approach to generate joint values from observing human demonstrations. We create a dataset in which humans imitate a robot and then use this collected data to train a diffusion model that enables a robot to imitate humans.
arXiv Detail & Related papers (2025-01-28T09:05:03Z)
Differentiable Robot Rendering [45.23538293501457]
We introduce differentiable robot rendering, a method allowing the visual appearance of a robot body to be directly differentiable with respect to its control parameters. We demonstrate its capability and usage in applications including reconstruction of robot poses from images and controlling robots through vision language models.
arXiv Detail & Related papers (2024-10-17T17:59:02Z)
Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction [51.49400490437258]
This work develops a method for imitating articulated object manipulation from a single monocular RGB human demonstration. We first propose 4D Differentiable Part Models (4D-DPM), a method for recovering 3D part motion from a monocular video. Given this 4D reconstruction, the robot replicates object trajectories by planning bimanual arm motions that induce the demonstrated object part motion. We evaluate 4D-DPM's 3D tracking accuracy on ground truth annotated 3D part trajectories and RSRD's physical execution performance on 9 objects across 10 trials each on a bimanual YuMi robot.
arXiv Detail & Related papers (2024-09-26T17:57:16Z)
Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics. Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens. We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z)
Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on. In this work, we propose MEDAL++, a novel design for self-improving robotic systems. The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z)
STPOTR: Simultaneous Human Trajectory and Pose Prediction Using a Non-Autoregressive Transformer for Robot Following Ahead [8.227864212055035]
We develop a neural network model to predict future human motion from an observed human motion history. We propose a non-autoregressive transformer architecture to leverage its parallel nature for easier training and fast, accurate predictions at test time. Our model is well-suited for robotic applications in terms of test accuracy and speed favorably with respect to state-of-the-art methods.
arXiv Detail & Related papers (2022-09-15T20:27:54Z)
On the Origins of Self-Modeling [27.888203008100113]
Self-Modeling is the process by which an agent, such as an animal or machine, learns to create a predictive model of its own dynamics. Here, we quantify the benefits of such self-modeling against the complexity of the robot.
arXiv Detail & Related papers (2022-09-05T15:27:04Z)
Neural Scene Representation for Locomotion on Structured Terrain [56.48607865960868]
We propose a learning-based method to reconstruct the local terrain for a mobile robot traversing urban environments. Using a stream of depth measurements from the onboard cameras and the robot's trajectory, the estimates the topography in the robot's vicinity. We propose a 3D reconstruction model that faithfully reconstructs the scene, despite the noisy measurements and large amounts of missing data coming from the blind spots of the camera arrangement.
arXiv Detail & Related papers (2022-06-16T10:45:17Z)
Full-Body Visual Self-Modeling of Robot Morphologies [29.76701883250049]
Internal computational models of physical bodies are fundamental to the ability of robots and animals alike to plan and control their actions. Recent progress in fully data-driven self-modeling has enabled machines to learn their own forward kinematics directly from task-agnostic interaction data. Here, we propose that instead of directly modeling forward-kinematics, a more useful form of self-modeling is one that could answer space occupancy queries.
arXiv Detail & Related papers (2021-11-11T18:58:07Z)
Learning a generative model for robot control using visual feedback [7.171234436165255]
We introduce a novel formulation for incorporating visual feedback in controlling robots. Inference in the model allows us to infer the robot state corresponding to target locations of the features. We demonstrate the effectiveness of our method by executing grasping and tight-fit insertions on robots with inaccurate controllers.
arXiv Detail & Related papers (2020-03-10T00:34:01Z)
Morphology-Agnostic Visual Robotic Control [76.44045983428701]
MAVRIC is an approach that works with minimal prior knowledge of the robot's morphology. We demonstrate our method on visually-guided 3D point reaching, trajectory following, and robot-to-robot imitation.
arXiv Detail & Related papers (2019-12-31T15:45:10Z)
Learning Predictive Models From Observation and Interaction [137.77887825854768]
Learning predictive models from interaction with the world allows an agent, such as a robot, to learn about how the world works. However, learning a model that captures the dynamics of complex skills represents a major challenge. We propose a method to augment the training set with observational data of other agents, such as humans.
arXiv Detail & Related papers (2019-12-30T01:10:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.