Learning Sampling Distributions for Model Predictive Control
- URL: http://arxiv.org/abs/2212.02587v1
- Date: Mon, 5 Dec 2022 20:35:36 GMT
- Title: Learning Sampling Distributions for Model Predictive Control
- Authors: Jacob Sacks and Byron Boots
- Abstract summary: Sampling-based approaches to Model Predictive Control (MPC) have become a cornerstone of contemporary approaches to MPC.
We propose to carry out all operations in the latent space, allowing us to take full advantage of the learned distribution.
Specifically, we frame the learning problem as bi-level optimization and show how to train the controller with backpropagation-through-time.
- Score: 36.82905770866734
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Sampling-based methods have become a cornerstone of contemporary approaches
to Model Predictive Control (MPC), as they make no restrictions on the
differentiability of the dynamics or cost function and are straightforward to
parallelize. However, their efficacy is highly dependent on the quality of the
sampling distribution itself, which is often assumed to be simple, like a
Gaussian. This restriction can result in samples which are far from optimal,
leading to poor performance. Recent work has explored improving the performance
of MPC by sampling in a learned latent space of controls. However, these
methods ultimately perform all MPC parameter updates and warm-starting between
time steps in the control space. This requires us to rely on a number of
heuristics for generating samples and updating the distribution and may lead to
sub-optimal performance. Instead, we propose to carry out all operations in the
latent space, allowing us to take full advantage of the learned distribution.
Specifically, we frame the learning problem as bi-level optimization and show
how to train the controller with backpropagation-through-time. By using a
normalizing flow parameterization of the distribution, we can leverage its
tractable density to avoid requiring differentiability of the dynamics and cost
function. Finally, we evaluate the proposed approach on simulated robotics
tasks and demonstrate its ability to surpass the performance of prior methods
and scale better with a reduced number of samples.
Related papers
- Diffusion Generative Flow Samplers: Improving learning signals through
partial trajectory optimization [87.21285093582446]
Diffusion Generative Flow Samplers (DGFS) is a sampling-based framework where the learning process can be tractably broken down into short partial trajectory segments.
Our method takes inspiration from the theory developed for generative flow networks (GFlowNets)
arXiv Detail & Related papers (2023-10-04T09:39:05Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - Maximize to Explore: One Objective Function Fusing Estimation, Planning,
and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX.
textttMEX integrates estimation and planning components while balancing exploration exploitation automatically.
It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z) - Learning to Optimize in Model Predictive Control [36.82905770866734]
Sampling-based Model Predictive Control (MPC) is a flexible control framework that can reason about non-smooth dynamics and cost functions.
We show that this can be particularly useful in sampling-based MPC, where we often wish to minimize the number of samples.
We show that we can contend with this noise by learning how to update the control distribution more effectively and make better use of the few samples that we have.
arXiv Detail & Related papers (2022-12-05T21:20:10Z) - Optimization of Annealed Importance Sampling Hyperparameters [77.34726150561087]
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models.
We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling.
We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
arXiv Detail & Related papers (2022-09-27T07:58:25Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Adaptive Client Sampling in Federated Learning via Online Learning with
Bandit Feedback [36.05851452151107]
federated learning (FL) systems need to sample a subset of clients that are involved in each round of training.
Despite its importance, there is limited work on how to sample clients effectively.
We show how our sampling method can improve the convergence speed of optimization algorithms.
arXiv Detail & Related papers (2021-12-28T23:50:52Z) - Demonstration-Efficient Guided Policy Search via Imitation of Robust
Tube MPC [36.3065978427856]
We propose a strategy to compress a computationally expensive Model Predictive Controller (MPC) into a more computationally efficient representation based on a deep neural network and Imitation Learning (IL)
By generating a Robust Tube variant (RTMPC) of the MPC and leveraging properties from the tube, we introduce a data augmentation method that enables high demonstration-efficiency.
Our method outperforms strategies commonly employed in IL, such as DAgger and Domain Randomization, in terms of demonstration-efficiency and robustness to perturbations unseen during training.
arXiv Detail & Related papers (2021-09-21T01:50:19Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Learning to Plan Optimally with Flow-based Motion Planner [29.124322674133]
We introduce a conditional normalising flow based distribution learned through previous experiences to improve sampling of these methods.
Our distribution can be conditioned on the current problem instance to provide an informative prior for sampling configurations within promising regions.
By using our normalising flow based distribution, a solution can be found faster, with less samples and better overall runtime performance.
arXiv Detail & Related papers (2020-10-21T21:46:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.