Related papers: Graph Neural Networks for Relational Inductive Bias in Vision-based Deep Reinforcement Learning of Robot Control

Graph Neural Networks for Relational Inductive Bias in Vision-based Deep Reinforcement Learning of Robot Control

URL: http://arxiv.org/abs/2203.05985v1
Date: Fri, 11 Mar 2022 15:11:54 GMT
Title: Graph Neural Networks for Relational Inductive Bias in Vision-based Deep Reinforcement Learning of Robot Control
Authors: Marco Oliva (1), Soubarna Banik (1), Josip Josifovski (1), Alois Knoll (1) ((1) Technical University of Munich, Germany)
Abstract summary: This work introduces a neural network architecture that combines relational inductive bias and visual feedback to learn an efficient position control policy. We derive a graph representation that models the robot's internal state with a low-dimensional description of the visual scene generated by an image encoding network. We show the ability of the model to improve sample efficiency for a 6-DoF robot arm in a visually realistic 3D environment.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: State-of-the-art reinforcement learning algorithms predominantly learn a policy from either a numerical state vector or images. Both approaches generally do not take structural knowledge of the task into account, which is especially prevalent in robotic applications and can benefit learning if exploited. This work introduces a neural network architecture that combines relational inductive bias and visual feedback to learn an efficient position control policy for robotic manipulation. We derive a graph representation that models the physical structure of the manipulator and combines the robot's internal state with a low-dimensional description of the visual scene generated by an image encoding network. On this basis, a graph neural network trained with reinforcement learning predicts joint velocities to control the robot. We further introduce an asymmetric approach of training the image encoder separately from the policy using supervised learning. Experimental results demonstrate that, for a 2-DoF planar robot in a geometrically simplistic 2D environment, a learned representation of the visual scene can replace access to the explicit coordinates of the reaching target without compromising on the quality and sample efficiency of the policy. We further show the ability of the model to improve sample efficiency for a 6-DoF robot arm in a visually realistic 3D environment.

Related papers

VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation [53.63540587160549]
VidBot is a framework enabling zero-shot robotic manipulation using learned 3D affordance from in-the-wild monocular RGB-only human videos. VidBot paves the way for leveraging everyday human videos to make robot learning more scalable.
arXiv Detail & Related papers (2025-03-10T10:04:58Z)
Pre-training Auto-regressive Robotic Models with 4D Representations [43.80798244473759]
ARM4R is an Auto-regressive Robotic Model that leverages low-level 4D Representations learned from human video data to yield a better pre-trained robotic model. Our experiments show that ARM4R can transfer efficiently from human video data to robotics and consistently improves performance on tasks across various robot environments and configurations.
arXiv Detail & Related papers (2025-02-18T18:59:01Z)
Learning Manipulation by Predicting Interaction [85.57297574510507]
We propose a general pre-training pipeline that learns Manipulation by Predicting the Interaction. The experimental results demonstrate that MPI exhibits remarkable improvement by 10% to 64% compared with previous state-of-the-art in real-world robot platforms.
arXiv Detail & Related papers (2024-06-01T13:28:31Z)
Graphical Object-Centric Actor-Critic [55.2480439325792]
We propose a novel object-centric reinforcement learning algorithm combining actor-critic and model-based approaches. We use a transformer encoder to extract object representations and graph neural networks to approximate the dynamics of an environment. Our algorithm performs better in a visually complex 3D robotic environment and a 2D environment with compositional structure than the state-of-the-art model-free actor-critic algorithm.
arXiv Detail & Related papers (2023-10-26T06:05:12Z)
Human-oriented Representation Learning for Robotic Manipulation [64.59499047836637]
Humans inherently possess generalizable visual representations that empower them to efficiently explore and interact with the environments in manipulation tasks. We formalize this idea through the lens of human-oriented multi-task fine-tuning on top of pre-trained visual encoders. Our Task Fusion Decoder consistently improves the representation of three state-of-the-art visual encoders for downstream manipulation policy-learning.
arXiv Detail & Related papers (2023-10-04T17:59:38Z)
Enhancing Robot Learning through Learned Human-Attention Feature Maps [6.724036710994883]
We think that embedding auxiliary information about focus point into robot learning would enhance efficiency and robustness of the learning process. In this paper, we propose a novel approach to model and emulate the human attention with an approximate prediction model. We test our approach on two learning tasks - object detection and imitation learning.
arXiv Detail & Related papers (2023-08-29T14:23:44Z)
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control [140.48218261864153]
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control. Our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training.
arXiv Detail & Related papers (2023-07-28T21:18:02Z)
Visual Reinforcement Learning with Self-Supervised 3D Representations [15.991546692872841]
We present a unified framework for self-supervised learning of 3D representations for motor control. Our method enjoys improved sample efficiency in simulated manipulation tasks compared to 2D representation learning methods.
arXiv Detail & Related papers (2022-10-13T17:59:55Z)
Masked World Models for Visual Control [90.13638482124567]
We introduce a visual model-based RL framework that decouples visual representation learning and dynamics learning. We demonstrate that our approach achieves state-of-the-art performance on a variety of visual robotic tasks.
arXiv Detail & Related papers (2022-06-28T18:42:27Z)
3D Neural Scene Representations for Visuomotor Control [78.79583457239836]
We learn models for dynamic 3D scenes purely from 2D visual observations. A dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks.
arXiv Detail & Related papers (2021-07-08T17:49:37Z)
KOVIS: Keypoint-based Visual Servoing with Zero-Shot Sim-to-Real Transfer for Robotics Manipulation [8.81267687440119]
KOVIS is a learning-based, calibration-free visual servoing method for fine robotic manipulation tasks with eye-in-hand stereo camera system. We train the deep neural network only in the simulated environment. We demonstrate the effectiveness of the proposed method in both simulated environment and real-world experiment with different robotic manipulation tasks.
arXiv Detail & Related papers (2020-07-28T02:53:28Z)
Understanding Contexts Inside Robot and Human Manipulation Tasks through a Vision-Language Model and Ontology System in a Video Stream [4.450615100675747]
We present a vision dataset under a strictly constrained knowledge domain for both robot and human manipulations. We propose a scheme to generate a combination of visual attentions and an evolving knowledge graph filled with commonsense knowledge. The proposed scheme allows the robot to mimic human-like intentional behaviors by watching real-time videos.
arXiv Detail & Related papers (2020-03-02T19:48:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.