Temporal Difference Learning for Model Predictive Control
- URL: http://arxiv.org/abs/2203.04955v1
- Date: Wed, 9 Mar 2022 18:58:28 GMT
- Title: Temporal Difference Learning for Model Predictive Control
- Authors: Nicklas Hansen, Xiaolong Wang, Hao Su
- Abstract summary: Data-driven model predictive control has two key advantages over model-free methods.
TD-MPC achieves superior sample efficiency and performance over prior work on both state and image-based continuous control tasks.
- Score: 29.217382374051347
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data-driven model predictive control has two key advantages over model-free
methods: a potential for improved sample efficiency through model learning, and
better performance as computational budget for planning increases. However, it
is both costly to plan over long horizons and challenging to obtain an accurate
model of the environment. In this work, we combine the strengths of model-free
and model-based methods. We use a learned task-oriented latent dynamics model
for local trajectory optimization over a short horizon, and use a learned
terminal value function to estimate long-term return, both of which are learned
jointly by temporal difference learning. Our method, TD-MPC, achieves superior
sample efficiency and asymptotic performance over prior work on both state and
image-based continuous control tasks from DMControl and Meta-World. Code and
video results are available at https://nicklashansen.github.io/td-mpc.
Related papers
- Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives.
We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z) - A Two-Phase Recall-and-Select Framework for Fast Model Selection [13.385915962994806]
We propose a two-phase (coarse-recall and fine-selection) model selection framework.
It aims to enhance the efficiency of selecting a robust model by leveraging the models' training performances on benchmark datasets.
It has been demonstrated that the proposed methodology facilitates the selection of a high-performing model at a rate about 3x times faster than conventional baseline methods.
arXiv Detail & Related papers (2024-03-28T14:44:44Z) - STORM: Efficient Stochastic Transformer based World Models for
Reinforcement Learning [82.03481509373037]
Recently, model-based reinforcement learning algorithms have demonstrated remarkable efficacy in visual input environments.
We introduce Transformer-based wORld Model (STORM), an efficient world model architecture that combines strong modeling and generation capabilities.
Storm achieves a mean human performance of $126.7%$ on the Atari $100$k benchmark, setting a new record among state-of-the-art methods.
arXiv Detail & Related papers (2023-10-14T16:42:02Z) - Simplified Temporal Consistency Reinforcement Learning [19.814047499837084]
We show that a simple representation learning approach relying on a latent dynamics model trained by latent temporal consistency is sufficient for high-performance RL.
Our approach outperforms model-free methods by a large margin and matches model-based methods' sample efficiency while training 2.4 times faster.
arXiv Detail & Related papers (2023-06-15T19:37:43Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - Model Predictive Control with Self-supervised Representation Learning [13.225264876433528]
We propose the use of a reconstruction function within the TD-MPC framework, so that the agent can reconstruct the original observation.
Our proposed addition of another loss term leads to improved performance on both state- and image-based tasks.
arXiv Detail & Related papers (2023-04-14T16:02:04Z) - Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language
Transfer Learning [59.38343286807997]
We propose Model-Agnostic Multitask Fine-tuning (MAMF) for vision-language models on unseen tasks.
Compared with model-agnostic meta-learning (MAML), MAMF discards the bi-level optimization and uses only first-order gradients.
We show that MAMF consistently outperforms the classical fine-tuning method for few-shot transfer learning on five benchmark datasets.
arXiv Detail & Related papers (2022-03-09T17:26:53Z) - Evaluating model-based planning and planner amortization for continuous
control [79.49319308600228]
We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning.
We find that well-tuned model-free agents are strong baselines even for high DoF control problems.
We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
arXiv Detail & Related papers (2021-10-07T12:00:40Z) - Double Meta-Learning for Data Efficient Policy Optimization in
Non-Stationary Environments [12.45281856559346]
We are interested in learning models of non-stationary environments, which can be framed as a multi-task learning problem.
Model-free reinforcement learning algorithms can achieve good performance in multi-task learning at a cost of extensive sampling.
While model-based approaches are among the most data efficient learning algorithms, they still struggle with complex tasks and model uncertainties.
arXiv Detail & Related papers (2020-11-21T03:19:35Z) - Model-Augmented Actor-Critic: Backpropagating through Paths [81.86992776864729]
Current model-based reinforcement learning approaches use the model simply as a learned black-box simulator.
We show how to make more effective use of the model by exploiting its differentiability.
arXiv Detail & Related papers (2020-05-16T19:18:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.