Graph Neural Networks for Relational Inductive Bias in Vision-based Deep
Reinforcement Learning of Robot Control
- URL: http://arxiv.org/abs/2203.05985v1
- Date: Fri, 11 Mar 2022 15:11:54 GMT
- Title: Graph Neural Networks for Relational Inductive Bias in Vision-based Deep
Reinforcement Learning of Robot Control
- Authors: Marco Oliva (1), Soubarna Banik (1), Josip Josifovski (1), Alois Knoll
(1) ((1) Technical University of Munich, Germany)
- Abstract summary: This work introduces a neural network architecture that combines relational inductive bias and visual feedback to learn an efficient position control policy.
We derive a graph representation that models the robot's internal state with a low-dimensional description of the visual scene generated by an image encoding network.
We show the ability of the model to improve sample efficiency for a 6-DoF robot arm in a visually realistic 3D environment.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art reinforcement learning algorithms predominantly learn a
policy from either a numerical state vector or images. Both approaches
generally do not take structural knowledge of the task into account, which is
especially prevalent in robotic applications and can benefit learning if
exploited. This work introduces a neural network architecture that combines
relational inductive bias and visual feedback to learn an efficient position
control policy for robotic manipulation. We derive a graph representation that
models the physical structure of the manipulator and combines the robot's
internal state with a low-dimensional description of the visual scene generated
by an image encoding network. On this basis, a graph neural network trained
with reinforcement learning predicts joint velocities to control the robot. We
further introduce an asymmetric approach of training the image encoder
separately from the policy using supervised learning. Experimental results
demonstrate that, for a 2-DoF planar robot in a geometrically simplistic 2D
environment, a learned representation of the visual scene can replace access to
the explicit coordinates of the reaching target without compromising on the
quality and sample efficiency of the policy. We further show the ability of the
model to improve sample efficiency for a 6-DoF robot arm in a visually
realistic 3D environment.
Related papers
- Learning Manipulation by Predicting Interaction [85.57297574510507]
We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction.
The experimental results demonstrate that MPI exhibits remarkable improvement by 10% to 64% compared with previous state-of-the-art in real-world robot platforms.
arXiv Detail & Related papers (2024-06-01T13:28:31Z) - Graphical Object-Centric Actor-Critic [55.2480439325792]
We propose a novel object-centric reinforcement learning algorithm combining actor-critic and model-based approaches.
We use a transformer encoder to extract object representations and graph neural networks to approximate the dynamics of an environment.
Our algorithm performs better in a visually complex 3D robotic environment and a 2D environment with compositional structure than the state-of-the-art model-free actor-critic algorithm.
arXiv Detail & Related papers (2023-10-26T06:05:12Z) - Human-oriented Representation Learning for Robotic Manipulation [64.59499047836637]
Humans inherently possess generalizable visual representations that empower them to efficiently explore and interact with the environments in manipulation tasks.
We formalize this idea through the lens of human-oriented multi-task fine-tuning on top of pre-trained visual encoders.
Our Task Fusion Decoder consistently improves the representation of three state-of-the-art visual encoders for downstream manipulation policy-learning.
arXiv Detail & Related papers (2023-10-04T17:59:38Z) - Enhancing Robot Learning through Learned Human-Attention Feature Maps [6.724036710994883]
We think that embedding auxiliary information about focus point into robot learning would enhance efficiency and robustness of the learning process.
In this paper, we propose a novel approach to model and emulate the human attention with an approximate prediction model.
We test our approach on two learning tasks - object detection and imitation learning.
arXiv Detail & Related papers (2023-08-29T14:23:44Z) - RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic
Control [140.48218261864153]
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control.
Our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training.
arXiv Detail & Related papers (2023-07-28T21:18:02Z) - Visual Reinforcement Learning with Self-Supervised 3D Representations [15.991546692872841]
We present a unified framework for self-supervised learning of 3D representations for motor control.
Our method enjoys improved sample efficiency in simulated manipulation tasks compared to 2D representation learning methods.
arXiv Detail & Related papers (2022-10-13T17:59:55Z) - Masked World Models for Visual Control [90.13638482124567]
We introduce a visual model-based RL framework that decouples visual representation learning and dynamics learning.
We demonstrate that our approach achieves state-of-the-art performance on a variety of visual robotic tasks.
arXiv Detail & Related papers (2022-06-28T18:42:27Z) - 3D Neural Scene Representations for Visuomotor Control [78.79583457239836]
We learn models for dynamic 3D scenes purely from 2D visual observations.
A dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks.
arXiv Detail & Related papers (2021-07-08T17:49:37Z) - KOVIS: Keypoint-based Visual Servoing with Zero-Shot Sim-to-Real
Transfer for Robotics Manipulation [8.81267687440119]
KOVIS is a learning-based, calibration-free visual servoing method for fine robotic manipulation tasks with eye-in-hand stereo camera system.
We train the deep neural network only in the simulated environment.
We demonstrate the effectiveness of the proposed method in both simulated environment and real-world experiment with different robotic manipulation tasks.
arXiv Detail & Related papers (2020-07-28T02:53:28Z) - Understanding Contexts Inside Robot and Human Manipulation Tasks through
a Vision-Language Model and Ontology System in a Video Stream [4.450615100675747]
We present a vision dataset under a strictly constrained knowledge domain for both robot and human manipulations.
We propose a scheme to generate a combination of visual attentions and an evolving knowledge graph filled with commonsense knowledge.
The proposed scheme allows the robot to mimic human-like intentional behaviors by watching real-time videos.
arXiv Detail & Related papers (2020-03-02T19:48:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.