The Value of Planning for Infinite-Horizon Model Predictive Control
- URL: http://arxiv.org/abs/2104.02863v1
- Date: Wed, 7 Apr 2021 02:21:55 GMT
- Title: The Value of Planning for Infinite-Horizon Model Predictive Control
- Authors: Nathan Hatch (1) and Byron Boots (1) ((1) University of Washington)
- Abstract summary: We show how the intermediate data structures used by modern planners can be interpreted as an approximate value function.
We show that this value function can be used by MPC directly, resulting in more efficient and resilient behavior at runtime.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model Predictive Control (MPC) is a classic tool for optimal control of
complex, real-world systems. Although it has been successfully applied to a
wide range of challenging tasks in robotics, it is fundamentally limited by the
prediction horizon, which, if too short, will result in myopic decisions.
Recently, several papers have suggested using a learned value function as the
terminal cost for MPC. If the value function is accurate, it effectively allows
MPC to reason over an infinite horizon. Unfortunately, Reinforcement Learning
(RL) solutions to value function approximation can be difficult to realize for
robotics tasks. In this paper, we suggest a more efficient method for value
function approximation that applies to goal-directed problems, like reaching
and navigation. In these problems, MPC is often formulated to track a path or
trajectory returned by a planner. However, this strategy is brittle in that
unexpected perturbations to the robot will require replanning, which can be
costly at runtime. Instead, we show how the intermediate data structures used
by modern planners can be interpreted as an approximate value function. We show
that that this value function can be used by MPC directly, resulting in more
efficient and resilient behavior at runtime.
Related papers
- Goal-Conditioned Terminal Value Estimation for Real-time and Multi-task Model Predictive Control [1.2687745030755995]
We develop an MPC framework with goal-conditioned terminal value learning to achieve multitask policy optimization.
We evaluate the proposed method on a bipedal inverted pendulum robot model and confirm that combining goal-conditioned terminal value learning with an upper-level trajectory planner enables real-time control.
arXiv Detail & Related papers (2024-10-07T11:19:23Z) - On Building Myopic MPC Policies using Supervised Learning [0.0]
This paper considers an alternative strategy, where supervised learning is used to learn the optimal value function offline instead of learning the optimal policy.
This can then be used as the cost-to-go function in a myopic MPC with a very short prediction horizon.
arXiv Detail & Related papers (2024-01-23T08:08:09Z) - A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning
with General Function Approximation [66.26739783789387]
We propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for reinforcement learning.
MQL-UCB achieves minimax optimal regret of $tildeO(dsqrtHK)$ when $K$ is sufficiently large and near-optimal policy switching cost.
Our work sheds light on designing provably sample-efficient and deployment-efficient Q-learning with nonlinear function approximation.
arXiv Detail & Related papers (2023-11-26T08:31:57Z) - Deep Model Predictive Optimization [21.22047409735362]
A major challenge in robotics is to design robust policies which enable complex and agile behaviors in the real world.
We propose Deep Model Predictive Optimization (DMPO), which learns the inner-loop of an MPC optimization algorithm directly via experience.
DMPO can outperform the best MPC algorithm by up to 27% with fewer samples and an end-to-end policy trained with MFRL by 19%.
arXiv Detail & Related papers (2023-10-06T21:11:52Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - Maximize to Explore: One Objective Function Fusing Estimation, Planning,
and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX.
textttMEX integrates estimation and planning components while balancing exploration exploitation automatically.
It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision
Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$.
Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z) - Evaluating model-based planning and planner amortization for continuous
control [79.49319308600228]
We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning.
We find that well-tuned model-free agents are strong baselines even for high DoF control problems.
We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
arXiv Detail & Related papers (2021-10-07T12:00:40Z) - Optimal Cost Design for Model Predictive Control [30.86835688868485]
Many robotics domains use non model control (MPC) for planning, which sets a reduced time horizon, performs optimization, and replans at every step.
In this work, we challenge the common assumption that the cost we optimize using MPC should be the same as the ground truth cost for the task (plus a terminal cost)
We propose a zeroth-order trajectory-based approach that enables us to design optimal costs for an MPC planning robot in continuous MDPs.
arXiv Detail & Related papers (2021-04-23T00:00:58Z) - Blending MPC & Value Function Approximation for Efficient Reinforcement
Learning [42.429730406277315]
Model-Predictive Control (MPC) is a powerful tool for controlling complex, real-world systems.
We present a framework for improving on MPC with model-free reinforcement learning (RL)
We show that our approach can obtain performance comparable with MPC with access to true dynamics.
arXiv Detail & Related papers (2020-12-10T11:32:01Z) - Exploiting Submodular Value Functions For Scaling Up Active Perception [60.81276437097671]
In active perception tasks, agent aims to select sensory actions that reduce uncertainty about one or more hidden variables.
Partially observable Markov decision processes (POMDPs) provide a natural model for such problems.
As the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially.
arXiv Detail & Related papers (2020-09-21T09:11:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.