Plan-Space State Embeddings for Improved Reinforcement Learning
- URL: http://arxiv.org/abs/2004.14567v1
- Date: Thu, 30 Apr 2020 03:38:14 GMT
- Title: Plan-Space State Embeddings for Improved Reinforcement Learning
- Authors: Max Pflueger and Gaurav S. Sukhatme
- Abstract summary: We present a new method for learning state embeddings from plans or other forms of demonstrations.
We show how these embeddings can then be used as an augmentation to the robot state in reinforcement learning problems.
- Score: 12.340412143459869
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robot control problems are often structured with a policy function that maps
state values into control values, but in many dynamic problems the observed
state can have a difficult to characterize relationship with useful policy
actions. In this paper we present a new method for learning state embeddings
from plans or other forms of demonstrations such that the embedding space has a
specified geometric relationship with the demonstrations. We present a novel
variational framework for learning these embeddings that attempts to optimize
trajectory linearity in the learned embedding space. We show how these
embedding spaces can then be used as an augmentation to the robot state in
reinforcement learning problems. We use kinodynamic planning to generate
training trajectories for some example environments, and then train embedding
spaces for these environments. We show empirically that observing a system in
the learned embedding space improves the performance of policy gradient
reinforcement learning algorithms, particularly by reducing the variance
between training runs. Our technique is limited to environments where
demonstration data is available, but places no limits on how that data is
collected. Our embedding technique provides a way to transfer domain knowledge
from existing technologies such as planning and control algorithms, into more
flexible policy learning algorithms, by creating an abstract representation of
the robot state with meaningful geometry.
Related papers
- AI planning in the imagination: High-level planning on learned abstract
search spaces [68.75684174531962]
We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space that the agent learns during training.
We evaluate our method on multiple domains, including the traveling salesman problem, Sokoban, 2048, the facility location problem, and Pacman.
arXiv Detail & Related papers (2023-08-16T22:47:16Z) - Learning-based Motion Planning in Dynamic Environments Using GNNs and
Temporal Encoding [15.58317292680615]
We propose a GNN-based approach that uses temporal encoding and imitation learning with data aggregation for learning both the embeddings and the edge prioritization policies.
Experiments show that the proposed methods can significantly accelerate online planning over state-of-the-art complete dynamic planning algorithms.
arXiv Detail & Related papers (2022-10-16T01:27:16Z) - Learning Robust Policy against Disturbance in Transition Dynamics via
State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments.
We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance.
Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z) - Dream to Explore: Adaptive Simulations for Autonomous Systems [3.0664963196464448]
We tackle the problem of learning to control dynamical systems by applying Bayesian nonparametric methods.
By employing Gaussian processes to discover latent world dynamics, we mitigate common data efficiency issues observed in reinforcement learning.
Our algorithm jointly learns a world model and policy by optimizing a variational lower bound of a log-likelihood.
arXiv Detail & Related papers (2021-10-27T04:27:28Z) - Composable Learning with Sparse Kernel Representations [110.19179439773578]
We present a reinforcement learning algorithm for learning sparse non-parametric controllers in a Reproducing Kernel Hilbert Space.
We improve the sample complexity of this approach by imposing a structure of the state-action function through a normalized advantage function.
We demonstrate the performance of this algorithm on learning obstacle-avoidance policies in multiple simulations of a robot equipped with a laser scanner while navigating in a 2D environment.
arXiv Detail & Related papers (2021-03-26T13:58:23Z) - Neural Dynamic Policies for End-to-End Sensorimotor Learning [51.24542903398335]
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces.
We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space.
NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks.
arXiv Detail & Related papers (2020-12-04T18:59:32Z) - PLAS: Latent Action Space for Offline Reinforcement Learning [18.63424441772675]
The goal of offline reinforcement learning is to learn a policy from a fixed dataset, without further interactions with the environment.
Existing off-policy algorithms have limited performance on static datasets due to extrapolation errors from out-of-distribution actions.
We demonstrate that our method provides competitive performance consistently across various continuous control tasks and different types of datasets.
arXiv Detail & Related papers (2020-11-14T03:38:38Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z) - Planning from Images with Deep Latent Gaussian Process Dynamics [2.924868086534434]
Planning is a powerful approach to control problems with known environment dynamics.
In unknown environments the agent needs to learn a model of the system dynamics to make planning applicable.
We propose to learn a deep latent Gaussian process dynamics (DLGPD) model that learns low-dimensional system dynamics from environment interactions with visual observations.
arXiv Detail & Related papers (2020-05-07T21:29:45Z) - Never Stop Learning: The Effectiveness of Fine-Tuning in Robotic
Reinforcement Learning [109.77163932886413]
We show how to adapt vision-based robotic manipulation policies to new variations by fine-tuning via off-policy reinforcement learning.
This adaptation uses less than 0.2% of the data necessary to learn the task from scratch.
We find that our approach of adapting pre-trained policies leads to substantial performance gains over the course of fine-tuning.
arXiv Detail & Related papers (2020-04-21T17:57:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.