Maximum Entropy Model Correction in Reinforcement Learning
- URL: http://arxiv.org/abs/2311.17855v1
- Date: Wed, 29 Nov 2023 18:00:41 GMT
- Title: Maximum Entropy Model Correction in Reinforcement Learning
- Authors: Amin Rakhsha, Mete Kemertas, Mohammad Ghavamzadeh, Amir-massoud
Farahmand
- Abstract summary: We propose and theoretically analyze an approach for planning with an approximate model in reinforcement learning.
We introduce the Model Correcting Value Iteration (MoCoVI) algorithm, and its sampled-based variant MoCoDyna.
Unlike traditional model-based algorithms, MoCoVI and MoCoDyna effectively utilize an approximate model and still converge to the correct value function.
- Score: 29.577846986302518
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose and theoretically analyze an approach for planning with an
approximate model in reinforcement learning that can reduce the adverse impact
of model error. If the model is accurate enough, it accelerates the convergence
to the true value function too. One of its key components is the MaxEnt Model
Correction (MoCo) procedure that corrects the model's next-state distributions
based on a Maximum Entropy density estimation formulation. Based on MoCo, we
introduce the Model Correcting Value Iteration (MoCoVI) algorithm, and its
sampled-based variant MoCoDyna. We show that MoCoVI and MoCoDyna's convergence
can be much faster than the conventional model-free algorithms. Unlike
traditional model-based algorithms, MoCoVI and MoCoDyna effectively utilize an
approximate model and still converge to the correct value function.
Related papers
- Supervised Score-Based Modeling by Gradient Boosting [49.556736252628745]
We propose a Supervised Score-based Model (SSM) which can be viewed as a gradient boosting algorithm combining score matching.
We provide a theoretical analysis of learning and sampling for SSM to balance inference time and prediction accuracy.
Our model outperforms existing models in both accuracy and inference time.
arXiv Detail & Related papers (2024-11-02T07:06:53Z) - Data-driven Nonlinear Model Reduction using Koopman Theory: Integrated
Control Form and NMPC Case Study [56.283944756315066]
We propose generic model structures combining delay-coordinate encoding of measurements and full-state decoding to integrate reduced Koopman modeling and state estimation.
A case study demonstrates that our approach provides accurate control models and enables real-time capable nonlinear model predictive control of a high-purity cryogenic distillation column.
arXiv Detail & Related papers (2024-01-09T11:54:54Z) - Is Model Ensemble Necessary? Model-based RL via a Single Model with
Lipschitz Regularized Value Function [23.255250192599327]
Probabilistic dynamics model ensemble is widely used in existing model-based reinforcement learning methods.
We find that, for a value function, the stronger the Lipschitz condition is, the smaller the gap between the true dynamics-induced Bellman operators is.
arXiv Detail & Related papers (2023-02-02T17:27:16Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - On the Generalization and Adaption Performance of Causal Models [99.64022680811281]
Differentiable causal discovery has proposed to factorize the data generating process into a set of modules.
We study the generalization and adaption performance of such modular neural causal models.
Our analysis shows that the modular neural causal models outperform other models on both zero and few-shot adaptation in low data regimes.
arXiv Detail & Related papers (2022-06-09T17:12:32Z) - Model-based Policy Optimization with Unsupervised Model Adaptation [37.09948645461043]
We investigate how to bridge the gap between real and simulated data due to inaccurate model estimation for better policy optimization.
We propose a novel model-based reinforcement learning framework AMPO, which introduces unsupervised model adaptation.
Our approach achieves state-of-the-art performance in terms of sample efficiency on a range of continuous control benchmark tasks.
arXiv Detail & Related papers (2020-10-19T14:19:42Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z) - Maximum Entropy Model Rollouts: Fast Model Based Policy Optimization
without Compounding Errors [10.906666680425754]
We propose a Dyna-style model-based reinforcement learning algorithm, which we called Maximum Entropy Model Rollouts (MEMR)
To eliminate the compounding errors, we only use our model to generate single-step rollouts.
arXiv Detail & Related papers (2020-06-08T21:38:15Z) - Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator.
We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.