Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning
- URL: http://arxiv.org/abs/2005.10872v2
- Date: Tue, 26 May 2020 16:34:43 GMT
- Title: Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning
- Authors: Michelle A. Lee, Carlos Florensa, Jonathan Tremblay, Nathan Ratliff,
Animesh Garg, Fabio Ramos, Dieter Fox
- Abstract summary: Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
- Score: 75.56839075060819
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional robotic approaches rely on an accurate model of the environment,
a detailed description of how to perform the task, and a robust perception
system to keep track of the current state. On the other hand, reinforcement
learning approaches can operate directly from raw sensory inputs with only a
reward signal to describe the task, but are extremely sample-inefficient and
brittle. In this work, we combine the strengths of model-based methods with the
flexibility of learning-based methods to obtain a general method that is able
to overcome inaccuracies in the robotics perception/actuation pipeline, while
requiring minimal interactions with the environment. This is achieved by
leveraging uncertainty estimates to divide the space in regions where the given
model-based policy is reliable, and regions where it may have flaws or not be
well defined. In these uncertain regions, we show that a locally learned-policy
can be used directly with raw sensory inputs. We test our algorithm, Guided
Uncertainty-Aware Policy Optimization (GUAPO), on a real-world robot performing
peg insertion. Videos are available at https://sites.google.com/view/guapo-rl
Related papers
- No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery [53.08822154199948]
Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula promise to enable agents to be robust to in- and out-of-distribution tasks.
This work investigates how existing UED methods select training environments, focusing on task prioritisation metrics.
We develop a method that directly trains on scenarios with high learnability.
arXiv Detail & Related papers (2024-08-27T14:31:54Z) - Efficient Imitation Learning with Conservative World Models [54.52140201148341]
We tackle the problem of policy learning from expert demonstrations without a reward function.
We re-frame imitation learning as a fine-tuning problem, rather than a pure reinforcement learning one.
arXiv Detail & Related papers (2024-05-21T20:53:18Z) - Evaluating Real-World Robot Manipulation Policies in Simulation [91.55267186958892]
Control and visual disparities between real and simulated environments are key challenges for reliable simulated evaluation.
We propose approaches for mitigating these gaps without needing to craft full-fidelity digital twins of real-world environments.
We create SIMPLER, a collection of simulated environments for manipulation policy evaluation on common real robot setups.
arXiv Detail & Related papers (2024-05-09T17:30:16Z) - Robust Learning from Observation with Model Misspecification [33.92371002674386]
Imitation learning (IL) is a popular paradigm for training policies in robotic systems.
We propose a robust IL algorithm to learn policies that can effectively transfer to the real environment without fine-tuning.
arXiv Detail & Related papers (2022-02-12T07:04:06Z) - Verified Probabilistic Policies for Deep Reinforcement Learning [6.85316573653194]
We tackle the problem of verifying probabilistic policies for deep reinforcement learning.
We propose an abstraction approach, based on interval Markov decision processes, that yields guarantees on a policy's execution.
We present techniques to build and solve these models using abstract interpretation, mixed-integer linear programming, entropy-based refinement and probabilistic model checking.
arXiv Detail & Related papers (2022-01-10T23:55:04Z) - Learning Robust Controllers Via Probabilistic Model-Based Policy Search [2.886634516775814]
We investigate whether controllers learned in such a way are robust and able to generalize under small perturbations of the environment.
We show that enforcing a lower bound to the likelihood noise in the Gaussian Process dynamics model regularizes the policy updates and yields more robust controllers.
arXiv Detail & Related papers (2021-10-26T11:17:31Z) - Zero-Shot Reinforcement Learning on Graphs for Autonomous Exploration
Under Uncertainty [6.42522897323111]
We present a framework for self-learning a high-performance exploration policy in a single simulation environment.
We propose a novel approach that uses graph neural networks in conjunction with deep reinforcement learning.
arXiv Detail & Related papers (2021-05-11T02:42:17Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment.
One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting.
But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient.
We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.