The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in
Reinforcement Learning
- URL: http://arxiv.org/abs/2007.03158v2
- Date: Thu, 3 Dec 2020 12:18:07 GMT
- Title: The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in
Reinforcement Learning
- Authors: Harm van Seijen and Hadi Nekoei and Evan Racah and Sarath Chandar
- Abstract summary: We introduce an experimental setup to evaluate model-based behavior of RL methods.
Our metric can identify model-based behavior, even if the method uses a poor representation.
We use our setup to evaluate the model-based behavior of MuZero on a variation of the classic Mountain Car task.
- Score: 21.967763416902265
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep model-based Reinforcement Learning (RL) has the potential to
substantially improve the sample-efficiency of deep RL. While various
challenges have long held it back, a number of papers have recently come out
reporting success with deep model-based methods. This is a great development,
but the lack of a consistent metric to evaluate such methods makes it difficult
to compare various approaches. For example, the common single-task
sample-efficiency metric conflates improvements due to model-based learning
with various other aspects, such as representation learning, making it
difficult to assess true progress on model-based RL. To address this, we
introduce an experimental setup to evaluate model-based behavior of RL methods,
inspired by work from neuroscience on detecting model-based behavior in humans
and animals. Our metric based on this setup, the Local Change Adaptation (LoCA)
regret, measures how quickly an RL method adapts to a local change in the
environment. Our metric can identify model-based behavior, even if the method
uses a poor representation and provides insight in how close a method's
behavior is from optimal model-based behavior. We use our setup to evaluate the
model-based behavior of MuZero on a variation of the classic Mountain Car task.
Related papers
- Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration [74.09687562334682]
We introduce a novel training data attribution method called Debias and Denoise Attribution (DDA)
Our method significantly outperforms existing approaches, achieving an averaged AUC of 91.64%.
DDA exhibits strong generality and scalability across various sources and different-scale models like LLaMA2, QWEN2, and Mistral.
arXiv Detail & Related papers (2024-10-02T07:14:26Z) - ReCoRe: Regularized Contrastive Representation Learning of World Model [21.29132219042405]
We present a world model that learns invariant features using contrastive unsupervised learning and an intervention-invariant regularizer.
Our method outperforms current state-of-the-art model-based and model-free RL methods and significantly improves on out-of-distribution point navigation tasks evaluated on the iGibson benchmark.
arXiv Detail & Related papers (2023-12-14T15:53:07Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Simplifying Model-based RL: Learning Representations, Latent-space
Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent.
We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z) - Towards Evaluating Adaptivity of Model-Based Reinforcement Learning
Methods [25.05409184943328]
We show that well-known model-based methods perform poorly in their ability to adapt to local environmental changes.
We identify elements that hurt adaptive behavior and link these to underlying techniques frequently used in deep model-based RL.
We provide insights into the challenges of building an adaptive nonlinear model-based method.
arXiv Detail & Related papers (2022-04-25T06:45:16Z) - Value Gradient weighted Model-Based Reinforcement Learning [28.366157882991565]
Model-based reinforcement learning (MBRL) is a sample efficient technique to obtain control policies.
VaGraM is a novel method for value-aware model learning.
arXiv Detail & Related papers (2022-04-04T13:28:31Z) - Sample-Efficient Reinforcement Learning via Conservative Model-Based
Actor-Critic [67.00475077281212]
Model-based reinforcement learning algorithms are more sample efficient than their model-free counterparts.
We propose a novel approach that achieves high sample efficiency without the strong reliance on accurate learned models.
We show that CMBAC significantly outperforms state-of-the-art approaches in terms of sample efficiency on several challenging tasks.
arXiv Detail & Related papers (2021-12-16T15:33:11Z) - Sample Efficient Reinforcement Learning via Model-Ensemble Exploration
and Exploitation [3.728946517493471]
MEEE is a model-ensemble method that consists of optimistic exploration and weighted exploitation.
Our approach outperforms other model-free and model-based state-of-the-art methods, especially in sample complexity.
arXiv Detail & Related papers (2021-07-05T07:18:20Z) - Learning to Reweight Imaginary Transitions for Model-Based Reinforcement
Learning [58.66067369294337]
When the model is inaccurate or biased, imaginary trajectories may be deleterious for training the action-value and policy functions.
We adaptively reweight the imaginary transitions, so as to reduce the negative effects of poorly generated trajectories.
Our method outperforms state-of-the-art model-based and model-free RL algorithms on multiple tasks.
arXiv Detail & Related papers (2021-04-09T03:13:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.