Towards Evaluating Adaptivity of Model-Based Reinforcement Learning
Methods
- URL: http://arxiv.org/abs/2204.11464v1
- Date: Mon, 25 Apr 2022 06:45:16 GMT
- Title: Towards Evaluating Adaptivity of Model-Based Reinforcement Learning
Methods
- Authors: Yi Wan, Ali Rahimi-Kalahroudi, Janarthanan Rajendran, Ida Momennejad,
Sarath Chandar, Harm van Seijen
- Abstract summary: We show that well-known model-based methods perform poorly in their ability to adapt to local environmental changes.
We identify elements that hurt adaptive behavior and link these to underlying techniques frequently used in deep model-based RL.
We provide insights into the challenges of building an adaptive nonlinear model-based method.
- Score: 25.05409184943328
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, a growing number of deep model-based reinforcement learning
(RL) methods have been introduced. The interest in deep model-based RL is not
surprising, given its many potential benefits, such as higher sample efficiency
and the potential for fast adaption to changes in the environment. However, we
demonstrate, using an improved version of the recently introduced Local Change
Adaptation (LoCA) setup, that well-known model-based methods such as PlaNet and
DreamerV2 perform poorly in their ability to adapt to local environmental
changes. Combined with prior work that made a similar observation about the
other popular model-based method, MuZero, a trend appears to emerge, suggesting
that current deep model-based methods have serious limitations. We dive deeper
into the causes of this poor performance, by identifying elements that hurt
adaptive behavior and linking these to underlying techniques frequently used in
deep model-based RL. We empirically validate these insights in the case of
linear function approximation by demonstrating that a modified version of
linear Dyna achieves effective adaptation to local changes. Furthermore, we
provide detailed insights into the challenges of building an adaptive nonlinear
model-based method, by experimenting with a nonlinear version of Dyna.
Related papers
- Knowledge Editing in Language Models via Adapted Direct Preference Optimization [50.616875565173274]
Large Language Models (LLMs) can become outdated over time.
Knowledge Editing aims to overcome this challenge using weight updates that do not require expensive retraining.
arXiv Detail & Related papers (2024-06-14T11:02:21Z) - ReCoRe: Regularized Contrastive Representation Learning of World Model [21.29132219042405]
We present a world model that learns invariant features using contrastive unsupervised learning and an intervention-invariant regularizer.
Our method outperforms current state-of-the-art model-based and model-free RL methods and significantly improves on out-of-distribution point navigation tasks evaluated on the iGibson benchmark.
arXiv Detail & Related papers (2023-12-14T15:53:07Z) - How to Fine-tune the Model: Unified Model Shift and Model Bias Policy
Optimization [13.440645736306267]
This paper develops an algorithm for model-based reinforcement learning.
It unifies model shift and model bias and then formulates a fine-tuning process.
It achieves state-of-the-art performance on several challenging benchmark tasks.
arXiv Detail & Related papers (2023-09-22T07:27:32Z) - Replay Buffer with Local Forgetting for Adapting to Local Environment
Changes in Deep Model-Based Reinforcement Learning [20.92599229976769]
We show that a simple variation of the first-in-first-out replay buffer is able to overcome the limitation of a replay buffer.
We demonstrate this by applying our replay-buffer variation to a deep version of the classical Dyna method.
arXiv Detail & Related papers (2023-03-15T15:21:26Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Adapting the Linearised Laplace Model Evidence for Modern Deep Learning [3.459382629188014]
The linearised Laplace method for estimating model uncertainty has received renewed attention in the deep learning community.
We show that these assumptions interact poorly with some now-standard tools of deep learning.
We make recommendations for how to better adapt this classic method to the modern setting.
arXiv Detail & Related papers (2022-06-17T17:18:31Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Learning to Reweight Imaginary Transitions for Model-Based Reinforcement
Learning [58.66067369294337]
When the model is inaccurate or biased, imaginary trajectories may be deleterious for training the action-value and policy functions.
We adaptively reweight the imaginary transitions, so as to reduce the negative effects of poorly generated trajectories.
Our method outperforms state-of-the-art model-based and model-free RL algorithms on multiple tasks.
arXiv Detail & Related papers (2021-04-09T03:13:35Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in
Reinforcement Learning [21.967763416902265]
We introduce an experimental setup to evaluate model-based behavior of RL methods.
Our metric can identify model-based behavior, even if the method uses a poor representation.
We use our setup to evaluate the model-based behavior of MuZero on a variation of the classic Mountain Car task.
arXiv Detail & Related papers (2020-07-07T01:34:55Z) - Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator.
We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.