Mind the Model, Not the Agent: The Primacy Bias in Model-based RL
- URL: http://arxiv.org/abs/2310.15017v3
- Date: Sun, 18 Aug 2024 12:35:55 GMT
- Title: Mind the Model, Not the Agent: The Primacy Bias in Model-based RL
- Authors: Zhongjian Qiao, Jiafei Lyu, Xiu Li,
- Abstract summary: primacy bias in model-free reinforcement learning (MFRL) refers to the agent's tendency to overfit early data and lose the ability to learn from new data.
Previous studies have shown that employing simple techniques, such as resetting the agent's parameters, can substantially alleviate the primacy bias in MFRL.
In this work, we focus on investigating the primacy bias in model-based reinforcement learning (MBRL)
- Score: 27.41568030157649
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The primacy bias in model-free reinforcement learning (MFRL), which refers to the agent's tendency to overfit early data and lose the ability to learn from new data, can significantly decrease the performance of MFRL algorithms. Previous studies have shown that employing simple techniques, such as resetting the agent's parameters, can substantially alleviate the primacy bias in MFRL. However, the primacy bias in model-based reinforcement learning (MBRL) remains unexplored. In this work, we focus on investigating the primacy bias in MBRL. We begin by observing that resetting the agent's parameters harms its performance in the context of MBRL. We further find that the primacy bias in MBRL is more closely related to the primacy bias of the world model instead of the primacy bias of the agent. Based on this finding, we propose \textit{world model resetting}, a simple yet effective technique to alleviate the primacy bias in MBRL. We apply our method to two different MBRL algorithms, MBPO and DreamerV2. We validate the effectiveness of our method on multiple continuous control tasks on MuJoCo and DeepMind Control Suite, as well as discrete control tasks on Atari 100k benchmark. The experimental results show that \textit{world model resetting} can significantly alleviate the primacy bias in the model-based setting and improve the algorithm's performance. We also give a guide on how to perform \textit{world model resetting} effectively.
Related papers
- BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning [39.090104460303415]
offline model-based reinforcement learning (MBRL) enhances data efficiency by utilizing pre-collected datasets to learn models and policies.
This paper first identifies the primary source of this mismatch comes from the underlying confounders present in offline data.
We introduce textbfBilintextbfEar textbfCAUSal rtextbfEpresentation(BECAUSE), an algorithm to capture causal representation for both states.
arXiv Detail & Related papers (2024-07-15T17:59:23Z) - Towards Robust Model-Based Reinforcement Learning Against Adversarial Corruption [60.958746600254884]
This study tackles the challenges of adversarial corruption in model-based reinforcement learning (RL)
We introduce an algorithm called corruption-robust optimistic MLE (CR-OMLE), which leverages total-variation (TV)-based information ratios as uncertainty weights for MLE.
We extend our weighting technique to the offline setting, and propose an algorithm named corruption-robust pessimistic MLE (CR-PMLE)
arXiv Detail & Related papers (2024-02-14T07:27:30Z) - Direct Preference Optimization: Your Language Model is Secretly a Reward Model [119.65409513119963]
We introduce a new parameterization of the reward model in RLHF that enables extraction of the corresponding optimal policy in closed form.
The resulting algorithm, which we call Direct Preference Optimization (DPO), is stable, performant, and computationally lightweight.
Our experiments show that DPO can fine-tune LMs to align with human preferences as well as or better than existing methods.
arXiv Detail & Related papers (2023-05-29T17:57:46Z) - Predictable MDP Abstraction for Unsupervised Model-Based RL [93.91375268580806]
We propose predictable MDP abstraction (PMA)
Instead of training a predictive model on the original MDP, we train a model on a transformed MDP with a learned action space.
We theoretically analyze PMA and empirically demonstrate that PMA leads to significant improvements over prior unsupervised model-based RL approaches.
arXiv Detail & Related papers (2023-02-08T07:37:51Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - An Analysis of Model-Based Reinforcement Learning From Abstracted
Observations [24.964038353043918]
We show that abstraction can introduce a dependence between samples collected online (e.g., in the real world) and results for Model-based Reinforcement learning (MBRL)
We show that we can use concentration inequalities for martingales to overcome this problem.
We illustrate this by combining R-MAX, a prototypical MBRL algorithm, with abstraction, thus producing the first performance guarantees for model-based 'RL from Abstracted Observations'
arXiv Detail & Related papers (2022-08-30T17:19:26Z) - Human-in-the-loop: Provably Efficient Preference-based Reinforcement
Learning with General Function Approximation [107.54516740713969]
We study human-in-the-loop reinforcement learning (RL) with trajectory preferences.
Instead of receiving a numeric reward at each step, the agent only receives preferences over trajectory pairs from a human overseer.
We propose the first optimistic model-based algorithm for PbRL with general function approximation.
arXiv Detail & Related papers (2022-05-23T09:03:24Z) - Physics-informed Dyna-Style Model-Based Deep Reinforcement Learning for
Dynamic Control [1.8275108630751844]
We propose to leverage the prior knowledge of underlying physics of the environment, where the governing laws are (partially) known.
By incorporating the prior information of the environment, the quality of the learned model can be notably improved.
arXiv Detail & Related papers (2021-07-31T02:19:36Z) - Model-based micro-data reinforcement learning: what are the crucial
model properties and which model to choose? [0.2836066255205732]
We contribute to micro-data model-based reinforcement learning (MBRL) by rigorously comparing popular generative models.
We find that on an environment that requires multimodal posterior predictives, mixture density nets outperform all other models by a large margin.
We also found that deterministic models are on par, in fact they consistently (although non-significantly) outperform their probabilistic counterparts.
arXiv Detail & Related papers (2021-07-24T11:38:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.