Related papers: Predictable MDP Abstraction for Unsupervised Model-Based RL

Predictable MDP Abstraction for Unsupervised Model-Based RL

URL: http://arxiv.org/abs/2302.03921v2
Date: Sat, 3 Jun 2023 23:38:06 GMT
Title: Predictable MDP Abstraction for Unsupervised Model-Based RL
Authors: Seohong Park, Sergey Levine
Abstract summary: We propose predictable MDP abstraction (PMA) Instead of training a predictive model on the original MDP, we train a model on a transformed MDP with a learned action space. We theoretically analyze PMA and empirically demonstrate that PMA leads to significant improvements over prior unsupervised model-based RL approaches.
Score: 93.91375268580806
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A key component of model-based reinforcement learning (RL) is a dynamics model that predicts the outcomes of actions. Errors in this predictive model can degrade the performance of model-based controllers, and complex Markov decision processes (MDPs) can present exceptionally difficult prediction problems. To mitigate this issue, we propose predictable MDP abstraction (PMA): instead of training a predictive model on the original MDP, we train a model on a transformed MDP with a learned action space that only permits predictable, easy-to-model actions, while covering the original state-action space as much as possible. As a result, model learning becomes easier and more accurate, which allows robust, stable model-based planning or model-based RL. This transformation is learned in an unsupervised manner, before any task is specified by the user. Downstream tasks can then be solved with model-based control in a zero-shot fashion, without additional environment interactions. We theoretically analyze PMA and empirically demonstrate that PMA leads to significant improvements over prior unsupervised model-based RL approaches in a range of benchmark environments. Our code and videos are available at https://seohong.me/projects/pma/

Related papers

M$^3$PC: Test-time Model Predictive Control for Pretrained Masked Trajectory Model [14.779390462893298]
We propose using Model Predictive Control (MPC) at test time to leverage the model's own predictive capability to guide its action selection. MPC significantly improves the decision-making performance of a pretrained trajectory model without any additional parameter training. Our framework can be adapted to Offline to Online (O2O) RL and Goal Reaching RL.
arXiv Detail & Related papers (2024-12-07T14:44:22Z)
Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference. Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable. We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z)
Learning Residual Model of Model Predictive Control via Random Forests for Autonomous Driving [13.865293598486492]
One major issue in predictive control (MPC) for autonomous driving is the contradiction between the system model's prediction and computation. This paper reformulates the MPC tracking accuracy as a program (QP) problem optimization as a program (QP) can effectively solve it.
arXiv Detail & Related papers (2023-04-10T03:32:09Z)
Plan To Predict: Learning an Uncertainty-Foreseeing Model for Model-Based Reinforcement Learning [32.24146877835396]
We propose emphPlan To Predict (P2P), a framework that treats the model rollout process as a sequential decision making problem. We show that P2P achieves state-of-the-art performance on several challenging benchmark tasks.
arXiv Detail & Related papers (2023-01-20T10:17:22Z)
When to Update Your Model: Constrained Model-based Reinforcement Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL) Our follow-up derived bounds reveal the relationship between model shifts and performance improvement. A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z)
Evaluating model-based planning and planner amortization for continuous control [79.49319308600228]
We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning. We find that well-tuned model-free agents are strong baselines even for high DoF control problems. We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
arXiv Detail & Related papers (2021-10-07T12:00:40Z)
Mismatched No More: Joint Model-Policy Optimization for Model-Based RL [172.37829823752364]
We propose a single objective for jointly training the model and the policy, such that updates to either component increases a lower bound on expected return. Our objective is a global lower bound on expected return, and this bound becomes tight under certain assumptions. The resulting algorithm (MnM) is conceptually similar to a GAN.
arXiv Detail & Related papers (2021-10-06T13:43:27Z)
Model-based micro-data reinforcement learning: what are the crucial model properties and which model to choose? [0.2836066255205732]
We contribute to micro-data model-based reinforcement learning (MBRL) by rigorously comparing popular generative models. We find that on an environment that requires multimodal posterior predictives, mixture density nets outperform all other models by a large margin. We also found that deterministic models are on par, in fact they consistently (although non-significantly) outperform their probabilistic counterparts.
arXiv Detail & Related papers (2021-07-24T11:38:25Z)
Generative Temporal Difference Learning for Infinite-Horizon Prediction [101.59882753763888]
We introduce the $gamma$-model, a predictive model of environment dynamics with an infinite probabilistic horizon. We discuss how its training reflects an inescapable tradeoff between training-time and testing-time compounding errors.
arXiv Detail & Related papers (2020-10-27T17:54:12Z)
Model-based Policy Optimization with Unsupervised Model Adaptation [37.09948645461043]
We investigate how to bridge the gap between real and simulated data due to inaccurate model estimation for better policy optimization. We propose a novel model-based reinforcement learning framework AMPO, which introduces unsupervised model adaptation. Our approach achieves state-of-the-art performance in terms of sample efficiency on a range of continuous control benchmark tasks.
arXiv Detail & Related papers (2020-10-19T14:19:42Z)
Bootstrapped model learning and error correction for planning with uncertainty in model-based RL [1.370633147306388]
A natural aim is to learn a model that reflects accurately the dynamics of the environment. This paper explores the problem of model misspecification through uncertainty-aware reinforcement learning agents. We propose a bootstrapped multi-headed neural network that learns the distribution of future states and rewards.
arXiv Detail & Related papers (2020-04-15T15:41:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.