Continual Model-Based Reinforcement Learning with Hypernetworks
- URL: http://arxiv.org/abs/2009.11997v2
- Date: Tue, 30 Mar 2021 02:46:27 GMT
- Title: Continual Model-Based Reinforcement Learning with Hypernetworks
- Authors: Yizhou Huang, Kevin Xie, Homanga Bharadhwaj and Florian Shkurti
- Abstract summary: We propose a method that continually learns encountered dynamics in a sequence of tasks using task-conditional hypernetworks.
Our method has three main attributes: first, it includes dynamics learning sessions that do not revisit training data from previous tasks, so it only needs to store the most recent fixed-size portion of the state transition experience.
We show that HyperCRL is effective in continual model-based reinforcement learning in robot locomotion and manipulation scenarios.
- Score: 24.86684067407964
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Effective planning in model-based reinforcement learning (MBRL) and
model-predictive control (MPC) relies on the accuracy of the learned dynamics
model. In many instances of MBRL and MPC, this model is assumed to be
stationary and is periodically re-trained from scratch on state transition
experience collected from the beginning of environment interactions. This
implies that the time required to train the dynamics model - and the pause
required between plan executions - grows linearly with the size of the
collected experience. We argue that this is too slow for lifelong robot
learning and propose HyperCRL, a method that continually learns the encountered
dynamics in a sequence of tasks using task-conditional hypernetworks. Our
method has three main attributes: first, it includes dynamics learning sessions
that do not revisit training data from previous tasks, so it only needs to
store the most recent fixed-size portion of the state transition experience;
second, it uses fixed-capacity hypernetworks to represent non-stationary and
task-aware dynamics; third, it outperforms existing continual learning
alternatives that rely on fixed-capacity networks, and does competitively with
baselines that remember an ever increasing coreset of past experience. We show
that HyperCRL is effective in continual model-based reinforcement learning in
robot locomotion and manipulation scenarios, such as tasks involving pushing
and door opening. Our project website with videos is at this link
https://rvl.cs.toronto.edu/blog/2020/hypercrl
Related papers
- Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network)
After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference.
We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - Learning a model is paramount for sample efficiency in reinforcement
learning control of PDEs [5.488334211013093]
We show that learning an actuated model in parallel to training the RL agent significantly reduces the total amount of required data sampled from the real system.
We also show that iteratively updating the model is of major importance to avoid biases in the RL training.
arXiv Detail & Related papers (2023-02-14T16:14:39Z) - Contrastive Value Learning: Implicit Models for Simple Offline RL [40.95632543012637]
We propose Contrastive Value Learning (CVL), which learns an implicit, multi-step model of the environment dynamics.
CVL can be learned without access to reward functions, but nonetheless can be used to directly estimate the value of each action.
Our experiments demonstrate that CVL outperforms prior offline RL methods on complex continuous control benchmarks.
arXiv Detail & Related papers (2022-11-03T19:10:05Z) - GEM: Group Enhanced Model for Learning Dynamical Control Systems [78.56159072162103]
We build effective dynamical models that are amenable to sample-based learning.
We show that learning the dynamics on a Lie algebra vector space is more effective than learning a direct state transition model.
This work sheds light on a connection between learning of dynamics and Lie group properties, which opens doors for new research directions.
arXiv Detail & Related papers (2021-04-07T01:08:18Z) - Neural Dynamic Policies for End-to-End Sensorimotor Learning [51.24542903398335]
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces.
We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space.
NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks.
arXiv Detail & Related papers (2020-12-04T18:59:32Z) - Iterative Semi-parametric Dynamics Model Learning For Autonomous Racing [2.40966076588569]
We develop and apply an iterative learning semi-parametric model, with a neural network, to the task of autonomous racing.
We show that our model can learn more accurately than a purely parametric model and generalize better than a purely non-parametric model.
arXiv Detail & Related papers (2020-11-17T16:24:10Z) - Learning to Continuously Optimize Wireless Resource In Episodically
Dynamic Environment [55.91291559442884]
This work develops a methodology that enables data-driven methods to continuously learn and optimize in a dynamic environment.
We propose to build the notion of continual learning into the modeling process of learning wireless systems.
Our design is based on a novel min-max formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2020-11-16T08:24:34Z) - Online Constrained Model-based Reinforcement Learning [13.362455603441552]
Key requirement is the ability to handle continuous state and action spaces while remaining within a limited time and resource budget.
We propose a model based approach that combines Gaussian Process regression and Receding Horizon Control.
We test our approach on a cart pole swing-up environment and demonstrate the benefits of online learning on an autonomous racing task.
arXiv Detail & Related papers (2020-04-07T15:51:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.