TOM: Learning Policy-Aware Models for Model-Based Reinforcement Learning
via Transition Occupancy Matching
- URL: http://arxiv.org/abs/2305.12663v1
- Date: Mon, 22 May 2023 03:06:09 GMT
- Title: TOM: Learning Policy-Aware Models for Model-Based Reinforcement Learning
via Transition Occupancy Matching
- Authors: Yecheng Jason Ma, Kausik Sivakumar, Jason Yan, Osbert Bastani, Dinesh
Jayaraman
- Abstract summary: We propose a new "transition occupancy matching" (TOM) objective for model learning.
TOM is good to the extent that the current policy experiences the same distribution of transitions inside the model as in the real environment.
We show that TOM successfully focuses model learning on policy-relevant experience and drives policies faster to higher task rewards.
- Score: 28.743727234246126
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Standard model-based reinforcement learning (MBRL) approaches fit a
transition model of the environment to all past experience, but this wastes
model capacity on data that is irrelevant for policy improvement. We instead
propose a new "transition occupancy matching" (TOM) objective for MBRL model
learning: a model is good to the extent that the current policy experiences the
same distribution of transitions inside the model as in the real environment.
We derive TOM directly from a novel lower bound on the standard reinforcement
learning objective. To optimize TOM, we show how to reduce it to a form of
importance weighted maximum-likelihood estimation, where the automatically
computed importance weights identify policy-relevant past experiences from a
replay buffer, enabling stable optimization. TOM thus offers a plug-and-play
model learning sub-routine that is compatible with any backbone MBRL algorithm.
On various Mujoco continuous robotic control tasks, we show that TOM
successfully focuses model learning on policy-relevant experience and drives
policies faster to higher task rewards than alternative model learning
approaches.
Related papers
- MOTO: Offline Pre-training to Online Fine-tuning for Model-based Robot
Learning [52.101643259906915]
We study the problem of offline pre-training and online fine-tuning for reinforcement learning from high-dimensional observations.
Existing model-based offline RL methods are not suitable for offline-to-online fine-tuning in high-dimensional domains.
We propose an on-policy model-based method that can efficiently reuse prior data through model-based value expansion and policy regularization.
arXiv Detail & Related papers (2024-01-06T21:04:31Z) - Data-Efficient Task Generalization via Probabilistic Model-based Meta
Reinforcement Learning [58.575939354953526]
PACOH-RL is a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics.
Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics.
Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions.
arXiv Detail & Related papers (2023-11-13T18:51:57Z) - A Unified Framework for Alternating Offline Model Training and Policy
Learning [62.19209005400561]
In offline model-based reinforcement learning, we learn a dynamic model from historically collected data, and utilize the learned model and fixed datasets for policy learning.
We develop an iterative offline MBRL framework, where we maximize a lower bound of the true expected return.
With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets.
arXiv Detail & Related papers (2022-10-12T04:58:51Z) - Fully Decentralized Model-based Policy Optimization for Networked
Systems [23.46407780093797]
This work aims to improve data efficiency of multi-agent control by model-based learning.
We consider networked systems where agents are cooperative and communicate only locally with their neighbors.
In our method, each agent learns a dynamic model to predict future states and broadcast their predictions by communication, and then the policies are trained under the model rollouts.
arXiv Detail & Related papers (2022-07-13T23:52:14Z) - Value Gradient weighted Model-Based Reinforcement Learning [28.366157882991565]
Model-based reinforcement learning (MBRL) is a sample efficient technique to obtain control policies.
VaGraM is a novel method for value-aware model learning.
arXiv Detail & Related papers (2022-04-04T13:28:31Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z) - Evaluating model-based planning and planner amortization for continuous
control [79.49319308600228]
We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning.
We find that well-tuned model-free agents are strong baselines even for high DoF control problems.
We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
arXiv Detail & Related papers (2021-10-07T12:00:40Z) - Model-Advantage Optimization for Model-Based Reinforcement Learning [41.13567626667456]
Model-based Reinforcement Learning (MBRL) algorithms have been traditionally designed with the goal of learning accurate dynamics of the environment.
Value-aware model learning, an alternative model-learning paradigm to maximum likelihood, proposes to inform model-learning through the value function of the learnt policy.
We propose a novel value-aware objective that is an upper bound on the absolute performance difference of a policy across two models.
arXiv Detail & Related papers (2021-06-26T20:01:28Z) - Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator.
We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.