How to Fine-tune the Model: Unified Model Shift and Model Bias Policy
Optimization
- URL: http://arxiv.org/abs/2309.12671v2
- Date: Tue, 24 Oct 2023 06:09:44 GMT
- Title: How to Fine-tune the Model: Unified Model Shift and Model Bias Policy
Optimization
- Authors: Hai Zhang, Hang Yu, Junqiao Zhao, Di Zhang, Chang Huang, Hongtu Zhou,
Xiao Zhang, Chen Ye
- Abstract summary: This paper develops an algorithm for model-based reinforcement learning.
It unifies model shift and model bias and then formulates a fine-tuning process.
It achieves state-of-the-art performance on several challenging benchmark tasks.
- Score: 13.440645736306267
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Designing and deriving effective model-based reinforcement learning (MBRL)
algorithms with a performance improvement guarantee is challenging, mainly
attributed to the high coupling between model learning and policy optimization.
Many prior methods that rely on return discrepancy to guide model learning
ignore the impacts of model shift, which can lead to performance deterioration
due to excessive model updates. Other methods use performance difference bound
to explicitly consider model shift. However, these methods rely on a fixed
threshold to constrain model shift, resulting in a heavy dependence on the
threshold and a lack of adaptability during the training process. In this
paper, we theoretically derive an optimization objective that can unify model
shift and model bias and then formulate a fine-tuning process. This process
adaptively adjusts the model updates to get a performance improvement guarantee
while avoiding model overfitting. Based on these, we develop a straightforward
algorithm USB-PO (Unified model Shift and model Bias Policy Optimization).
Empirical results show that USB-PO achieves state-of-the-art performance on
several challenging benchmark tasks.
Related papers
- Plan To Predict: Learning an Uncertainty-Foreseeing Model for
Model-Based Reinforcement Learning [32.24146877835396]
We propose emphPlan To Predict (P2P), a framework that treats the model rollout process as a sequential decision making problem.
We show that P2P achieves state-of-the-art performance on several challenging benchmark tasks.
arXiv Detail & Related papers (2023-01-20T10:17:22Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Which Model To Trust: Assessing the Influence of Models on the
Performance of Reinforcement Learning Algorithms for Continuous Control Tasks [0.0]
It is not clear how much of the recent progress is due to improved algorithms or due to improved models.
A set of commonly adopted models is established for the purpose of model comparison.
Results reveal significant differences in model performance do exist.
arXiv Detail & Related papers (2021-10-25T16:17:26Z) - Mismatched No More: Joint Model-Policy Optimization for Model-Based RL [172.37829823752364]
We propose a single objective for jointly training the model and the policy, such that updates to either component increases a lower bound on expected return.
Our objective is a global lower bound on expected return, and this bound becomes tight under certain assumptions.
The resulting algorithm (MnM) is conceptually similar to a GAN.
arXiv Detail & Related papers (2021-10-06T13:43:27Z) - Model-Invariant State Abstractions for Model-Based Reinforcement
Learning [54.616645151708994]
We introduce a new type of state abstraction called textitmodel-invariance.
This allows for generalization to novel combinations of unseen values of state variables.
We prove that an optimal policy can be learned over this model-invariance state abstraction.
arXiv Detail & Related papers (2021-02-19T10:37:54Z) - COMBO: Conservative Offline Model-Based Policy Optimization [120.55713363569845]
Uncertainty estimation with complex models, such as deep neural networks, can be difficult and unreliable.
We develop a new model-based offline RL algorithm, COMBO, that regularizes the value function on out-of-support state-actions.
We find that COMBO consistently performs as well or better as compared to prior offline model-free and model-based methods.
arXiv Detail & Related papers (2021-02-16T18:50:32Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z) - Variational Model-based Policy Optimization [34.80171122943031]
Model-based reinforcement learning (RL) algorithms allow us to combine model-generated data with those collected from interaction with the real system in order to alleviate the data efficiency problem in RL.
We propose an objective function as a variational lower-bound of a log-likelihood of a log-likelihood to jointly learn and improve model and policy.
Our experiments on a number of continuous control tasks show that despite being more complex, our model-based (E-step) algorithm, called emactoral model-based policy optimization (VMBPO), is more sample-efficient and
arXiv Detail & Related papers (2020-06-09T18:30:15Z) - Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator.
We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.