Related papers: A Game Theoretic Framework for Model Based Reinforcement Learning

A Game Theoretic Framework for Model Based Reinforcement Learning

URL: http://arxiv.org/abs/2004.07804v2
Date: Thu, 11 Mar 2021 05:52:14 GMT
Title: A Game Theoretic Framework for Model Based Reinforcement Learning
Authors: Aravind Rajeswaran, Igor Mordatch, Vikash Kumar
Abstract summary: Model-based reinforcement learning (MBRL) has recently gained immense interest due to its potential for sample efficiency and ability to incorporate off-policy data. We develop a new framework that casts MBRL as a game between: (1) a policy player, which attempts to maximize rewards under the learned model; (2) a model player, which attempts to fit the real-world data collected by the policy player. Our framework is consistent with and provides a clear basis for gradients known to be important in practice from prior works.
Score: 39.45066100705418
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Model-based reinforcement learning (MBRL) has recently gained immense interest due to its potential for sample efficiency and ability to incorporate off-policy data. However, designing stable and efficient MBRL algorithms using rich function approximators have remained challenging. To help expose the practical challenges in MBRL and simplify algorithm design from the lens of abstraction, we develop a new framework that casts MBRL as a game between: (1) a policy player, which attempts to maximize rewards under the learned model; (2) a model player, which attempts to fit the real-world data collected by the policy player. For algorithm development, we construct a Stackelberg game between the two players, and show that it can be solved with approximate bi-level optimization. This gives rise to two natural families of algorithms for MBRL based on which player is chosen as the leader in the Stackelberg game. Together, they encapsulate, unify, and generalize many previous MBRL algorithms. Furthermore, our framework is consistent with and provides a clear basis for heuristics known to be important in practice from prior works. Finally, through experiments we validate that our proposed algorithms are highly sample efficient, match the asymptotic performance of model-free policy gradient, and scale gracefully to high-dimensional tasks like dexterous hand manipulation. Additional details and code can be obtained from the project page at https://sites.google.com/view/mbrl-game

Related papers

Bayes Adaptive Monte Carlo Tree Search for Offline Model-based Reinforcement Learning [5.663006149337036]
offline model-based reinforcement learning (MBRL) is a powerful approach for data-driven decision-making and control. There could be various MDPs that behave identically on the offline dataset and so dealing with the uncertainty about the true MDP can be challenging. We introduce a novel Bayes Adaptive Monte-Carlo planning algorithm capable of solving BAMDPs in continuous state and action spaces.
arXiv Detail & Related papers (2024-10-15T03:36:43Z)
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining [49.730897226510095]
We introduce JOWA: Jointly-Reinforced World-Action model, an offline model-based RL agent pretrained on Atari games with 6 billion tokens data. Our largest agent, with 150 million parameters, 78.9% human-level performance on pretrained games using only 10% subsampled offline data, outperforming existing state-of-the-art large-scale offline RL baselines by 31.6% on averange.
arXiv Detail & Related papers (2024-10-01T10:25:03Z)
Direct Preference-based Policy Optimization without Reward Modeling [25.230992130108767]
Preference-based reinforcement learning (PbRL) is an approach that enables RL agents to learn from preference. We propose a PbRL algorithm that directly learns from preference without requiring any reward modeling. We show that our algorithm surpasses offline RL methods that learn with ground-truth reward information.
arXiv Detail & Related papers (2023-01-30T12:51:13Z)
Oracle Inequalities for Model Selection in Offline Reinforcement Learning [105.74139523696284]
We study the problem of model selection in offline RL with value function approximation. We propose the first model selection algorithm for offline RL that achieves minimax rate-optimal inequalities up to logarithmic factors. We conclude with several numerical simulations showing it is capable of reliably selecting a good model class.
arXiv Detail & Related papers (2022-11-03T17:32:34Z)
Representation Learning for General-sum Low-rank Markov Games [63.119870889883224]
We study multi-agent general-sum Markov games with nonlinear function approximation. We focus on low-rank Markov games whose transition matrix admits a hidden low-rank structure on top of an unknown non-linear representation.
arXiv Detail & Related papers (2022-10-30T22:58:22Z)
Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy. In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks. We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z)
Scalable Deep Reinforcement Learning Algorithms for Mean Field Games [60.550128966505625]
Mean Field Games (MFGs) have been introduced to efficiently approximate games with very large populations of strategic agents. Recently, the question of learning equilibria in MFGs has gained momentum, particularly using model-free reinforcement learning (RL) methods. Existing algorithms to solve MFGs require the mixing of approximated quantities such as strategies or $q$-values. We propose two methods to address this shortcoming. The first one learns a mixed strategy from distillation of historical data into a neural network and is applied to the Fictitious Play algorithm. The second one is an online mixing method based on
arXiv Detail & Related papers (2022-03-22T18:10:32Z)
Multitask Adaptation by Retrospective Exploration with Learned World Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage. The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z)
PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration [15.173628100049129]
This work studies a model-based algorithm for both Kernelized Regulators (KNR) and linear Markov Decision Processes (MDPs) For both models, our algorithm guarantees sample complexity and only uses access to a planning oracle. Our method can also perform reward-free exploration efficiently.
arXiv Detail & Related papers (2021-07-15T15:49:30Z)
Model-Advantage Optimization for Model-Based Reinforcement Learning [41.13567626667456]
Model-based Reinforcement Learning (MBRL) algorithms have been traditionally designed with the goal of learning accurate dynamics of the environment. Value-aware model learning, an alternative model-learning paradigm to maximum likelihood, proposes to inform model-learning through the value function of the learnt policy. We propose a novel value-aware objective that is an upper bound on the absolute performance difference of a policy across two models.
arXiv Detail & Related papers (2021-06-26T20:01:28Z)
MBRL-Lib: A Modular Library for Model-based Reinforcement Learning [13.467075854633213]
We present MBRL-Lib -- a machine learning library for model-based reinforcement learning in continuous state-action spaces based on PyTorch. It is designed as a platform for both researchers, to easily develop, debug and compare new algorithms, and non-expert user, to lower the entry-bar of deploying state-of-the-art algorithms.
arXiv Detail & Related papers (2021-04-20T17:58:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.