Augmented World Models Facilitate Zero-Shot Dynamics Generalization From
a Single Offline Environment
- URL: http://arxiv.org/abs/2104.05632v1
- Date: Mon, 12 Apr 2021 16:53:55 GMT
- Title: Augmented World Models Facilitate Zero-Shot Dynamics Generalization From
a Single Offline Environment
- Authors: Philip J. Ball, Cong Lu, Jack Parker-Holder, Stephen Roberts
- Abstract summary: Reinforcement learning from large-scale offline datasets provides us with the ability to learn policies without potentially unsafe or impractical exploration.
Little attention has been paid to potentially changing dynamics when transferring a policy to the online setting.
We augment a learned dynamics model with simple transformations that seek to capture potential changes in physical properties of the robot.
- Score: 10.04587045407742
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning from large-scale offline datasets provides us with the
ability to learn policies without potentially unsafe or impractical
exploration. Significant progress has been made in the past few years in
dealing with the challenge of correcting for differing behavior between the
data collection and learned policies. However, little attention has been paid
to potentially changing dynamics when transferring a policy to the online
setting, where performance can be up to 90% reduced for existing methods. In
this paper we address this problem with Augmented World Models (AugWM). We
augment a learned dynamics model with simple transformations that seek to
capture potential changes in physical properties of the robot, leading to more
robust policies. We not only train our policy in this new setting, but also
provide it with the sampled augmentation as a context, allowing it to adapt to
changes in the environment. At test time we learn the context in a
self-supervised fashion by approximating the augmentation which corresponds to
the new environment. We rigorously evaluate our approach on over 100 different
changed dynamics settings, and show that this simple approach can significantly
improve the zero-shot generalization of a recent state-of-the-art baseline,
often achieving successful policies where the baseline fails.
Related papers
- Efficient Imitation Learning with Conservative World Models [54.52140201148341]
We tackle the problem of policy learning from expert demonstrations without a reward function.
We re-frame imitation learning as a fine-tuning problem, rather than a pure reinforcement learning one.
arXiv Detail & Related papers (2024-05-21T20:53:18Z) - Equivariant Data Augmentation for Generalization in Offline
Reinforcement Learning [10.00979536266327]
We present a novel approach to address the challenge of generalization in offline reinforcement learning (RL)
Specifically, we aim to improve the agent's ability to generalize to out-of-distribution goals.
We learn a new policy offline based on the augmented dataset, with an off-the-shelf offline RL algorithm.
arXiv Detail & Related papers (2023-09-14T10:22:33Z) - Model Generation with Provable Coverability for Offline Reinforcement
Learning [14.333861814143718]
offline optimization with dynamics-aware policy provides a new perspective for policy learning and out-of-distribution generalization.
But due to the limitation under the offline setting, the learned model could not mimic real dynamics well enough to support reliable out-of-distribution exploration.
We propose an algorithm to generate models optimizing their coverage for the real dynamics.
arXiv Detail & Related papers (2022-06-01T08:34:09Z) - Transfer learning with causal counterfactual reasoning in Decision
Transformers [5.672132510411465]
We study the problem of transfer learning under changes in the environment dynamics.
Specifically, we use the Decision Transformer architecture to distill a new policy on the new environment.
We show that this mechanism can bootstrap a successful policy on the target environment while retaining most of the reward.
arXiv Detail & Related papers (2021-10-27T11:23:27Z) - Learning to Continuously Optimize Wireless Resource in a Dynamic
Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment.
We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes.
Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z) - Learning to Continuously Optimize Wireless Resource In Episodically
Dynamic Environment [55.91291559442884]
This work develops a methodology that enables data-driven methods to continuously learn and optimize in a dynamic environment.
We propose to build the notion of continual learning into the modeling process of learning wireless systems.
Our design is based on a novel min-max formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2020-11-16T08:24:34Z) - ADAIL: Adaptive Adversarial Imitation Learning [11.270858993502705]
We present the ADaptive Adversarial Imitation Learning (ADAIL) algorithm for learning adaptive policies that can be transferred between environments of varying dynamics.
This is an important problem in robotic learning because in real world scenarios 1) reward functions are hard to obtain, 2) learned policies from one domain are difficult to deploy in another due to varying source to target domain statistics, and 3) collecting expert demonstrations in multiple environments where the dynamics are known and controlled is often infeasible.
arXiv Detail & Related papers (2020-08-23T06:11:00Z) - Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity.
Our method leverages latent variable models to learn a representation of the environment from current and past experiences.
We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z) - Non-Stationary Off-Policy Optimization [50.41335279896062]
We study the novel problem of off-policy optimization in piecewise-stationary contextual bandits.
In the offline learning phase, we partition logged data into categorical latent states and learn a near-optimal sub-policy for each state.
In the online deployment phase, we adaptively switch between the learned sub-policies based on their performance.
arXiv Detail & Related papers (2020-06-15T09:16:09Z) - Provably Efficient Model-based Policy Adaptation [22.752774605277555]
A promising approach is to quickly adapt pre-trained policies to new environments.
Existing methods for this policy adaptation problem typically rely on domain randomization and meta-learning.
We propose new model-based mechanisms that are able to make online adaptation in unseen target environments.
arXiv Detail & Related papers (2020-06-14T23:16:20Z) - Never Stop Learning: The Effectiveness of Fine-Tuning in Robotic
Reinforcement Learning [109.77163932886413]
We show how to adapt vision-based robotic manipulation policies to new variations by fine-tuning via off-policy reinforcement learning.
This adaptation uses less than 0.2% of the data necessary to learn the task from scratch.
We find that our approach of adapting pre-trained policies leads to substantial performance gains over the course of fine-tuning.
arXiv Detail & Related papers (2020-04-21T17:57:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.