Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual
Model-Based Reinforcement Learning
- URL: http://arxiv.org/abs/2012.04603v1
- Date: Tue, 8 Dec 2020 18:03:21 GMT
- Title: Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual
Model-Based Reinforcement Learning
- Authors: Mohammad Babaeizadeh, Mohammad Taghi Saffar, Danijar Hafner, Harini
Kannan, Chelsea Finn, Sergey Levine, Dumitru Erhan
- Abstract summary: We study a number of design decisions for the predictive model in visual MBRL algorithms.
We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance.
We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
- Score: 109.74041512359476
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-based reinforcement learning (MBRL) methods have shown strong sample
efficiency and performance across a variety of tasks, including when faced with
high-dimensional visual observations. These methods learn to predict the
environment dynamics and expected reward from interaction and use this
predictive model to plan and perform the task. However, MBRL methods vary in
their fundamental design choices, and there is no strong consensus in the
literature on how these design decisions affect performance. In this paper, we
study a number of design decisions for the predictive model in visual MBRL
algorithms, focusing specifically on methods that use a predictive model for
planning. We find that a range of design decisions that are often considered
crucial, such as the use of latent spaces, have little effect on task
performance. A big exception to this finding is that predicting future
observations (i.e., images) leads to significant task performance improvement
compared to only predicting rewards. We also empirically find that image
prediction accuracy, somewhat surprisingly, correlates more strongly with
downstream task performance than reward prediction accuracy. We show how this
phenomenon is related to exploration and how some of the lower-scoring models
on standard benchmarks (that require exploration) will perform the same as the
best-performing models when trained on the same training data. Simultaneously,
in the absence of exploration, models that fit the data better usually perform
better on the downstream task as well, but surprisingly, these are often not
the same models that perform the best when learning and exploring from scratch.
These findings suggest that performance and exploration place important and
potentially contradictory requirements on the model.
Related papers
- Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning [78.72226641279863]
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling.
Our research explores task-specific model pruning to inform decisions about designing SMoE architectures.
We introduce an adaptive task-aware pruning technique UNCURL to reduce the number of experts per MoE layer in an offline manner post-training.
arXiv Detail & Related papers (2024-09-02T22:35:03Z) - Causal Estimation of Memorisation Profiles [58.20086589761273]
Understanding memorisation in language models has practical and societal implications.
Memorisation is the causal effect of training with an instance on the model's ability to predict that instance.
This paper proposes a new, principled, and efficient method to estimate memorisation based on the difference-in-differences design from econometrics.
arXiv Detail & Related papers (2024-06-06T17:59:09Z) - Secrets of RLHF in Large Language Models Part II: Reward Modeling [134.97964938009588]
We introduce a series of novel methods to mitigate the influence of incorrect and ambiguous preferences in the dataset.
We also introduce contrastive learning to enhance the ability of reward models to distinguish between chosen and rejected responses.
arXiv Detail & Related papers (2024-01-11T17:56:59Z) - Think Twice: Measuring the Efficiency of Eliminating Prediction
Shortcuts of Question Answering Models [3.9052860539161918]
We propose a simple method for measuring a scale of models' reliance on any identified spurious feature.
We assess the robustness towards a large set of known and newly found prediction biases for various pre-trained models and debiasing methods in Question Answering (QA)
We find that while existing debiasing methods can mitigate reliance on a chosen spurious feature, the OOD performance gains of these methods can not be explained by mitigated reliance on biased features.
arXiv Detail & Related papers (2023-05-11T14:35:00Z) - A Control-Centric Benchmark for Video Prediction [69.22614362800692]
We propose a benchmark for action-conditioned video prediction in the form of a control benchmark.
Our benchmark includes simulated environments with 11 task categories and 310 task instance definitions.
We then leverage our benchmark to study the effects of scaling model size, quantity of training data, and model ensembling.
arXiv Detail & Related papers (2023-04-26T17:59:45Z) - Exploring the Impact of Instruction Data Scaling on Large Language
Models: An Empirical Study on Real-World Use Cases [17.431381376675432]
In this paper we explore the performance of large language models based on instruction tuning across different scales of instruction data.
With Bloomz-7B1-mt as the base model, the results show that merely increasing the amount of instruction data leads to continuous improvement in tasks such as open-ended generation.
We propose potential future research directions such as effectively selecting high-quality training data, scaling base models and training methods specialized for hard tasks.
arXiv Detail & Related papers (2023-03-26T14:49:37Z) - Simplifying Model-based RL: Learning Representations, Latent-space
Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent.
We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z) - Model-based micro-data reinforcement learning: what are the crucial
model properties and which model to choose? [0.2836066255205732]
We contribute to micro-data model-based reinforcement learning (MBRL) by rigorously comparing popular generative models.
We find that on an environment that requires multimodal posterior predictives, mixture density nets outperform all other models by a large margin.
We also found that deterministic models are on par, in fact they consistently (although non-significantly) outperform their probabilistic counterparts.
arXiv Detail & Related papers (2021-07-24T11:38:25Z) - Sample Efficient Reinforcement Learning via Model-Ensemble Exploration
and Exploitation [3.728946517493471]
MEEE is a model-ensemble method that consists of optimistic exploration and weighted exploitation.
Our approach outperforms other model-free and model-based state-of-the-art methods, especially in sample complexity.
arXiv Detail & Related papers (2021-07-05T07:18:20Z) - Objective Mismatch in Model-based Reinforcement Learning [14.92062504466269]
Model-based reinforcement learning (MBRL) has been shown to be a powerful framework for data-efficiently learning control of continuous tasks.
We identify a fundamental issue of the standard MBRL framework -- what we call the objective mismatch issue.
We propose an initial method to mitigate the mismatch issue by re-weighting dynamics model training.
arXiv Detail & Related papers (2020-02-11T16:26:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.