Value Gradient weighted Model-Based Reinforcement Learning
- URL: http://arxiv.org/abs/2204.01464v2
- Date: Tue, 20 Jun 2023 19:28:16 GMT
- Title: Value Gradient weighted Model-Based Reinforcement Learning
- Authors: Claas Voelcker and Victor Liao and Animesh Garg and Amir-massoud
Farahmand
- Abstract summary: Model-based reinforcement learning (MBRL) is a sample efficient technique to obtain control policies.
VaGraM is a novel method for value-aware model learning.
- Score: 28.366157882991565
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model-based reinforcement learning (MBRL) is a sample efficient technique to
obtain control policies, yet unavoidable modeling errors often lead performance
deterioration. The model in MBRL is often solely fitted to reconstruct
dynamics, state observations in particular, while the impact of model error on
the policy is not captured by the training objective. This leads to a mismatch
between the intended goal of MBRL, enabling good policy and value learning, and
the target of the loss function employed in practice, future state prediction.
Naive intuition would suggest that value-aware model learning would fix this
problem and, indeed, several solutions to this objective mismatch problem have
been proposed based on theoretical analysis. However, they tend to be inferior
in practice to commonly used maximum likelihood (MLE) based approaches. In this
paper we propose the Value-gradient weighted Model Learning (VaGraM), a novel
method for value-aware model learning which improves the performance of MBRL in
challenging settings, such as small model capacity and the presence of
distracting state dimensions. We analyze both MLE and value-aware approaches
and demonstrate how they fail to account for exploration and the behavior of
function approximation when learning value-aware models and highlight the
additional goals that must be met to stabilize optimization in the deep
learning setting. We verify our analysis by showing that our loss function is
able to achieve high returns on the Mujoco benchmark suite while being more
robust than maximum likelihood based approaches.
Related papers
- Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning [97.2995389188179]
Recent research has begun to approach large language models (LLMs) unlearning via gradient ascent (GA)
Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning.
We propose several controlling methods that can regulate the extent of excessive unlearning.
arXiv Detail & Related papers (2024-06-13T14:41:00Z) - A Unified View on Solving Objective Mismatch in Model-Based Reinforcement Learning [10.154341066746975]
Model-based Reinforcement Learning (MBRL) aims to make agents more sample-efficient, adaptive, and explainable.
How to best learn the model is still an unresolved question.
arXiv Detail & Related papers (2023-10-10T01:58:38Z) - Plan To Predict: Learning an Uncertainty-Foreseeing Model for
Model-Based Reinforcement Learning [32.24146877835396]
We propose emphPlan To Predict (P2P), a framework that treats the model rollout process as a sequential decision making problem.
We show that P2P achieves state-of-the-art performance on several challenging benchmark tasks.
arXiv Detail & Related papers (2023-01-20T10:17:22Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Simplifying Model-based RL: Learning Representations, Latent-space
Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent.
We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z) - Mismatched No More: Joint Model-Policy Optimization for Model-Based RL [172.37829823752364]
We propose a single objective for jointly training the model and the policy, such that updates to either component increases a lower bound on expected return.
Our objective is a global lower bound on expected return, and this bound becomes tight under certain assumptions.
The resulting algorithm (MnM) is conceptually similar to a GAN.
arXiv Detail & Related papers (2021-10-06T13:43:27Z) - Physics-informed Dyna-Style Model-Based Deep Reinforcement Learning for
Dynamic Control [1.8275108630751844]
We propose to leverage the prior knowledge of underlying physics of the environment, where the governing laws are (partially) known.
By incorporating the prior information of the environment, the quality of the learned model can be notably improved.
arXiv Detail & Related papers (2021-07-31T02:19:36Z) - Model-Advantage Optimization for Model-Based Reinforcement Learning [41.13567626667456]
Model-based Reinforcement Learning (MBRL) algorithms have been traditionally designed with the goal of learning accurate dynamics of the environment.
Value-aware model learning, an alternative model-learning paradigm to maximum likelihood, proposes to inform model-learning through the value function of the learnt policy.
We propose a novel value-aware objective that is an upper bound on the absolute performance difference of a policy across two models.
arXiv Detail & Related papers (2021-06-26T20:01:28Z) - Discriminator Augmented Model-Based Reinforcement Learning [47.094522301093775]
It is common in practice for the learned model to be inaccurate, impairing planning and leading to poor performance.
This paper aims to improve planning with an importance sampling framework that accounts for discrepancy between the true and learned dynamics.
arXiv Detail & Related papers (2021-03-24T06:01:55Z) - Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator.
We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.