MoDem: Accelerating Visual Model-Based Reinforcement Learning with
Demonstrations
- URL: http://arxiv.org/abs/2212.05698v1
- Date: Mon, 12 Dec 2022 04:28:50 GMT
- Title: MoDem: Accelerating Visual Model-Based Reinforcement Learning with
Demonstrations
- Authors: Nicklas Hansen, Yixin Lin, Hao Su, Xiaolong Wang, Vikash Kumar,
Aravind Rajeswaran
- Abstract summary: Poor sample efficiency continues to be the primary challenge for deployment of deep Reinforcement Learning (RL) algorithms for real-world applications.
We find that leveraging just a handful of demonstrations can dramatically improve the sample-efficiency of model-based RL.
We empirically study three complex visuo-motor control domains and find that our method is 150%-250% more successful in completing sparse reward tasks.
- Score: 36.44386146801296
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Poor sample efficiency continues to be the primary challenge for deployment
of deep Reinforcement Learning (RL) algorithms for real-world applications, and
in particular for visuo-motor control. Model-based RL has the potential to be
highly sample efficient by concurrently learning a world model and using
synthetic rollouts for planning and policy improvement. However, in practice,
sample-efficient learning with model-based RL is bottlenecked by the
exploration challenge. In this work, we find that leveraging just a handful of
demonstrations can dramatically improve the sample-efficiency of model-based
RL. Simply appending demonstrations to the interaction dataset, however, does
not suffice. We identify key ingredients for leveraging demonstrations in model
learning -- policy pretraining, targeted exploration, and oversampling of
demonstration data -- which forms the three phases of our model-based RL
framework. We empirically study three complex visuo-motor control domains and
find that our method is 150%-250% more successful in completing sparse reward
tasks compared to prior approaches in the low data regime (100K interaction
steps, 5 demonstrations). Code and videos are available at:
https://nicklashansen.github.io/modemrl
Related papers
- Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment [65.15914284008973]
We propose to leverage an Inverse Reinforcement Learning (IRL) technique to simultaneously build an reward model and a policy model.
We show that the proposed algorithms converge to the stationary solutions of the IRL problem.
Our results indicate that it is beneficial to leverage reward learning throughout the entire alignment process.
arXiv Detail & Related papers (2024-05-28T07:11:05Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - ReCoRe: Regularized Contrastive Representation Learning of World Model [21.29132219042405]
We present a world model that learns invariant features using contrastive unsupervised learning and an intervention-invariant regularizer.
Our method outperforms current state-of-the-art model-based and model-free RL methods and significantly improves on out-of-distribution point navigation tasks evaluated on the iGibson benchmark.
arXiv Detail & Related papers (2023-12-14T15:53:07Z) - HarmonyDream: Task Harmonization Inside World Models [93.07314830304193]
Model-based reinforcement learning (MBRL) holds the promise of sample-efficient learning.
We propose a simple yet effective approach, HarmonyDream, which automatically adjusts loss coefficients to maintain task harmonization.
arXiv Detail & Related papers (2023-09-30T11:38:13Z) - Simplified Temporal Consistency Reinforcement Learning [19.814047499837084]
We show that a simple representation learning approach relying on a latent dynamics model trained by latent temporal consistency is sufficient for high-performance RL.
Our approach outperforms model-free methods by a large margin and matches model-based methods' sample efficiency while training 2.4 times faster.
arXiv Detail & Related papers (2023-06-15T19:37:43Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - Learning a model is paramount for sample efficiency in reinforcement
learning control of PDEs [5.488334211013093]
We show that learning an actuated model in parallel to training the RL agent significantly reduces the total amount of required data sampled from the real system.
We also show that iteratively updating the model is of major importance to avoid biases in the RL training.
arXiv Detail & Related papers (2023-02-14T16:14:39Z) - Simplifying Model-based RL: Learning Representations, Latent-space
Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent.
We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z) - PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided
Exploration [15.173628100049129]
This work studies a model-based algorithm for both Kernelized Regulators (KNR) and linear Markov Decision Processes (MDPs)
For both models, our algorithm guarantees sample complexity and only uses access to a planning oracle.
Our method can also perform reward-free exploration efficiently.
arXiv Detail & Related papers (2021-07-15T15:49:30Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Domain Knowledge Integration By Gradient Matching For Sample-Efficient
Reinforcement Learning [0.0]
We propose a gradient matching algorithm to improve sample efficiency by utilizing target slope information from the dynamics to aid the model-free learner.
We demonstrate this by presenting a technique for matching the gradient information from the model-based learner with the model-free component in an abstract low-dimensional space.
arXiv Detail & Related papers (2020-05-28T05:02:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.