Evaluating model-based planning and planner amortization for continuous
control
- URL: http://arxiv.org/abs/2110.03363v1
- Date: Thu, 7 Oct 2021 12:00:40 GMT
- Title: Evaluating model-based planning and planner amortization for continuous
control
- Authors: Arunkumar Byravan, Leonard Hasenclever, Piotr Trochim, Mehdi Mirza,
Alessandro Davide Ialongo, Yuval Tassa, Jost Tobias Springenberg, Abbas
Abdolmaleki, Nicolas Heess, Josh Merel, Martin Riedmiller
- Abstract summary: We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning.
We find that well-tuned model-free agents are strong baselines even for high DoF control problems.
We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
- Score: 79.49319308600228
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There is a widespread intuition that model-based control methods should be
able to surpass the data efficiency of model-free approaches. In this paper we
attempt to evaluate this intuition on various challenging locomotion tasks. We
take a hybrid approach, combining model predictive control (MPC) with a learned
model and model-free policy learning; the learned policy serves as a proposal
for MPC. We find that well-tuned model-free agents are strong baselines even
for high DoF control problems but MPC with learned proposals and models
(trained on the fly or transferred from related tasks) can significantly
improve performance and data efficiency in hard multi-task/multi-goal settings.
Finally, we show that it is possible to distil a model-based planner into a
policy that amortizes the planning computation without any loss of performance.
Videos of agents performing different tasks can be seen at
https://sites.google.com/view/mbrl-amortization/home.
Related papers
- MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation [80.47072100963017]
We introduce a novel and low-compute algorithm, Model Merging with Amortized Pareto Front (MAP)
MAP efficiently identifies a set of scaling coefficients for merging multiple models, reflecting the trade-offs involved.
We also introduce Bayesian MAP for scenarios with a relatively low number of tasks and Nested MAP for situations with a high number of tasks, further reducing the computational cost of evaluation.
arXiv Detail & Related papers (2024-06-11T17:55:25Z) - Towards Efficient Task-Driven Model Reprogramming with Foundation Models [52.411508216448716]
Vision foundation models exhibit impressive power, benefiting from the extremely large model capacity and broad training data.
However, in practice, downstream scenarios may only support a small model due to the limited computational resources or efficiency considerations.
This brings a critical challenge for the real-world application of foundation models: one has to transfer the knowledge of a foundation model to the downstream task.
arXiv Detail & Related papers (2023-04-05T07:28:33Z) - Predictable MDP Abstraction for Unsupervised Model-Based RL [93.91375268580806]
We propose predictable MDP abstraction (PMA)
Instead of training a predictive model on the original MDP, we train a model on a transformed MDP with a learned action space.
We theoretically analyze PMA and empirically demonstrate that PMA leads to significant improvements over prior unsupervised model-based RL approaches.
arXiv Detail & Related papers (2023-02-08T07:37:51Z) - Model-based Reinforcement Learning with Multi-step Plan Value Estimation [4.158979444110977]
We introduce multi-step plans to replace multi-step actions for model-based RL.
The new model-based reinforcement learning algorithm MPPVE shows a better utilization of the learned model and achieves a better sample efficiency than state-of-the-art model-based RL approaches.
arXiv Detail & Related papers (2022-09-12T18:22:11Z) - Fully Decentralized Model-based Policy Optimization for Networked
Systems [23.46407780093797]
This work aims to improve data efficiency of multi-agent control by model-based learning.
We consider networked systems where agents are cooperative and communicate only locally with their neighbors.
In our method, each agent learns a dynamic model to predict future states and broadcast their predictions by communication, and then the policies are trained under the model rollouts.
arXiv Detail & Related papers (2022-07-13T23:52:14Z) - Visual Foresight With a Local Dynamics Model [1.370633147306388]
We propose the Local Dynamics Model (LDM) which efficiently learns the state-transition function for single-step manipulation primitives.
By combining the LDM with model-free policy learning, we can learn policies which can solve complex manipulation tasks using one-step lookahead planning.
arXiv Detail & Related papers (2022-06-29T17:58:14Z) - Temporal Difference Learning for Model Predictive Control [29.217382374051347]
Data-driven model predictive control has two key advantages over model-free methods.
TD-MPC achieves superior sample efficiency and performance over prior work on both state and image-based continuous control tasks.
arXiv Detail & Related papers (2022-03-09T18:58:28Z) - Goal-Aware Prediction: Learning to Model What Matters [105.43098326577434]
One of the fundamental challenges in using a learned forward dynamics model is the mismatch between the objective of the learned model and that of the downstream planner or policy.
We propose to direct prediction towards task relevant information, enabling the model to be aware of the current task and encouraging it to only model relevant quantities of the state space.
We find that our method more effectively models the relevant parts of the scene conditioned on the goal, and as a result outperforms standard task-agnostic dynamics models and model-free reinforcement learning.
arXiv Detail & Related papers (2020-07-14T16:42:59Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z) - Variational Model-based Policy Optimization [34.80171122943031]
Model-based reinforcement learning (RL) algorithms allow us to combine model-generated data with those collected from interaction with the real system in order to alleviate the data efficiency problem in RL.
We propose an objective function as a variational lower-bound of a log-likelihood of a log-likelihood to jointly learn and improve model and policy.
Our experiments on a number of continuous control tasks show that despite being more complex, our model-based (E-step) algorithm, called emactoral model-based policy optimization (VMBPO), is more sample-efficient and
arXiv Detail & Related papers (2020-06-09T18:30:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.