Related papers: Model-Advantage Optimization for Model-Based Reinforcement Learning

Model-Advantage Optimization for Model-Based Reinforcement Learning

URL: http://arxiv.org/abs/2106.14080v1
Date: Sat, 26 Jun 2021 20:01:28 GMT
Title: Model-Advantage Optimization for Model-Based Reinforcement Learning
Authors: Nirbhay Modhe, Harish Kamath, Dhruv Batra, Ashwin Kalyan
Abstract summary: Model-based Reinforcement Learning (MBRL) algorithms have been traditionally designed with the goal of learning accurate dynamics of the environment. Value-aware model learning, an alternative model-learning paradigm to maximum likelihood, proposes to inform model-learning through the value function of the learnt policy. We propose a novel value-aware objective that is an upper bound on the absolute performance difference of a policy across two models.
Score: 41.13567626667456
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Model-based Reinforcement Learning (MBRL) algorithms have been traditionally designed with the goal of learning accurate dynamics of the environment. This introduces a mismatch between the objectives of model-learning and the overall learning problem of finding an optimal policy. Value-aware model learning, an alternative model-learning paradigm to maximum likelihood, proposes to inform model-learning through the value function of the learnt policy. While this paradigm is theoretically sound, it does not scale beyond toy settings. In this work, we propose a novel value-aware objective that is an upper bound on the absolute performance difference of a policy across two models. Further, we propose a general purpose algorithm that modifies the standard MBRL pipeline -- enabling learning with value aware objectives. Our proposed objective, in conjunction with this algorithm, is the first successful instantiation of value-aware MBRL on challenging continuous control environments, outperforming previous value-aware objectives and with competitive performance w.r.t. MLE-based MBRL approaches.

Related papers

Data-Efficient Task Generalization via Probabilistic Model-based Meta Reinforcement Learning [58.575939354953526]
PACOH-RL is a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics. Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics. Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions.
arXiv Detail & Related papers (2023-11-13T18:51:57Z)
TOM: Learning Policy-Aware Models for Model-Based Reinforcement Learning via Transition Occupancy Matching [28.743727234246126]
We propose a new "transition occupancy matching" (TOM) objective for model learning. TOM is good to the extent that the current policy experiences the same distribution of transitions inside the model as in the real environment. We show that TOM successfully focuses model learning on policy-relevant experience and drives policies faster to higher task rewards.
arXiv Detail & Related papers (2023-05-22T03:06:09Z)
The Virtues of Laziness in Model-based RL: A Unified Objective and Algorithms [37.025378882978714]
We propose a novel approach to addressing two fundamental challenges in Model-based Reinforcement Learning (MBRL) Our "lazy" method leverages a novel unified objective, Performance Difference via Advantage in Model, to capture the performance difference between the learned policy and expert policy. We present two no-regret algorithms to optimize the proposed objective, and demonstrate their statistical and computational gains.
arXiv Detail & Related papers (2023-03-01T17:42:26Z)
A Unified Framework for Alternating Offline Model Training and Policy Learning [62.19209005400561]
In offline model-based reinforcement learning, we learn a dynamic model from historically collected data, and utilize the learned model and fixed datasets for policy learning. We develop an iterative offline MBRL framework, where we maximize a lower bound of the true expected return. With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets.
arXiv Detail & Related papers (2022-10-12T04:58:51Z)
Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent. We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z)
Value Gradient weighted Model-Based Reinforcement Learning [28.366157882991565]
Model-based reinforcement learning (MBRL) is a sample efficient technique to obtain control policies. VaGraM is a novel method for value-aware model learning.
arXiv Detail & Related papers (2022-04-04T13:28:31Z)
Discriminator Augmented Model-Based Reinforcement Learning [47.094522301093775]
It is common in practice for the learned model to be inaccurate, impairing planning and leading to poor performance. This paper aims to improve planning with an importance sampling framework that accounts for discrepancy between the true and learned dynamics.
arXiv Detail & Related papers (2021-03-24T06:01:55Z)
Objective Mismatch in Model-based Reinforcement Learning [14.92062504466269]
Model-based reinforcement learning (MBRL) has been shown to be a powerful framework for data-efficiently learning control of continuous tasks. We identify a fundamental issue of the standard MBRL framework -- what we call the objective mismatch issue. We propose an initial method to mitigate the mismatch issue by re-weighting dynamics model training.
arXiv Detail & Related papers (2020-02-11T16:26:07Z)
Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL. We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.