Related papers: CoVO-MPC: Theoretical Analysis of Sampling-based MPC and Optimal Covariance Design

CoVO-MPC: Theoretical Analysis of Sampling-based MPC and Optimal Covariance Design

URL: http://arxiv.org/abs/2401.07369v1
Date: Sun, 14 Jan 2024 21:10:59 GMT
Title: CoVO-MPC: Theoretical Analysis of Sampling-based MPC and Optimal Covariance Design
Authors: Zeji Yi, Chaoyi Pan, Guanqi He, Guannan Qu, Guanya Shi
Abstract summary: We characterize the convergence property of a widely used sampling-based Model Predictive Path Integral Control (MPPI) method. We show that MPPI enjoys at least linear convergence rates when the optimization is quadratic, which covers time-varying LQR systems. Our theoretical analysis directly leads to a novel sampling-based MPC algorithm, CoVo-MPC. Empirically, CoVo-MPC significantly outperforms standard MPPI by 43-54% in both simulations and real-world quad agile control tasks.
Score: 8.943418808959494
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Sampling-based Model Predictive Control (MPC) has been a practical and effective approach in many domains, notably model-based reinforcement learning, thanks to its flexibility and parallelizability. Despite its appealing empirical performance, the theoretical understanding, particularly in terms of convergence analysis and hyperparameter tuning, remains absent. In this paper, we characterize the convergence property of a widely used sampling-based MPC method, Model Predictive Path Integral Control (MPPI). We show that MPPI enjoys at least linear convergence rates when the optimization is quadratic, which covers time-varying LQR systems. We then extend to more general nonlinear systems. Our theoretical analysis directly leads to a novel sampling-based MPC algorithm, CoVariance-Optimal MPC (CoVo-MPC) that optimally schedules the sampling covariance to optimize the convergence rate. Empirically, CoVo-MPC significantly outperforms standard MPPI by 43-54% in both simulations and real-world quadrotor agile control tasks. Videos and Appendices are available at \url{https://lecar-lab.github.io/CoVO-MPC/}.

Related papers

Performance-driven Constrained Optimal Auto-Tuner for MPC [36.143463447995536]
We propose COAT-MPC, Constrained Optimal Auto-Tuner for MPC. COAT-MPC gathers performance data and learns by updating its posterior belief. We theoretically analyze COAT-MPC, showing that it satisfies performance constraints with arbitrarily high probability.
arXiv Detail & Related papers (2025-03-10T09:56:08Z)
Preference Optimization via Contrastive Divergence: Your Reward Model is Secretly an NLL Estimator [32.05337749590184]
We develop a novel PO framework that provides theoretical guidance to effectively sample dispreferred completions. We then select contrastive divergence (CD) as sampling strategy, and propose a novel MC-PO algorithm. OnMC-PO outperforms existing SOTA baselines, and OnMC-PO leads to further improvement.
arXiv Detail & Related papers (2025-02-06T23:45:08Z)
Transformer-based Model Predictive Control: Trajectory Optimization via Sequence Modeling [16.112708478263745]
We present a unified framework combine the main strengths of optimization-based methods for learning. Our approach entails embedding high-capacity, transformer-based neural network models within optimization process. Compared to purely optimization-based approaches, results show that our approach can improve performance by up to 75%.
arXiv Detail & Related papers (2024-10-31T13:23:10Z)
Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning [50.92957910121088]
This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS) For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium. We extend Reg-MAIDS to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample efficient manner.
arXiv Detail & Related papers (2024-04-30T06:48:56Z)
Parameter-Adaptive Approximate MPC: Tuning Neural-Network Controllers without Retraining [50.00291020618743]
This work introduces a novel, parameter-adaptive AMPC architecture capable of online tuning without recomputing large datasets and retraining. We showcase the effectiveness of parameter-adaptive AMPC by controlling the swing-ups of two different real cartpole systems with a severely resource-constrained microcontroller (MCU) Taken together, these contributions represent a marked step toward the practical application of AMPC in real-world systems.
arXiv Detail & Related papers (2024-04-08T20:02:19Z)
Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference. Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z)
Deep Model Predictive Optimization [21.22047409735362]
A major challenge in robotics is to design robust policies which enable complex and agile behaviors in the real world. We propose Deep Model Predictive Optimization (DMPO), which learns the inner-loop of an MPC optimization algorithm directly via experience. DMPO can outperform the best MPC algorithm by up to 27% with fewer samples and an end-to-end policy trained with MFRL by 19%.
arXiv Detail & Related papers (2023-10-06T21:11:52Z)
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX. textttMEX integrates estimation and planning components while balancing exploration exploitation automatically. It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z)
Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice [79.48432795639403]
Mirror descent value iteration (MDVI) is an abstraction of Kullback-Leibler (KL) and entropy-regularized reinforcement learning (RL) We study MDVI with linear function approximation through its sample complexity required to identify an $varepsilon$-optimal policy. We present Variance-Weighted Least-Squares MDVI, the first theoretical algorithm that achieves nearly minimax optimal sample complexity for infinite-horizon linear MDPs.
arXiv Detail & Related papers (2023-05-22T16:13:05Z)
Learning Sampling Distributions for Model Predictive Control [36.82905770866734]
Sampling-based approaches to Model Predictive Control (MPC) have become a cornerstone of contemporary approaches to MPC. We propose to carry out all operations in the latent space, allowing us to take full advantage of the learned distribution. Specifically, we frame the learning problem as bi-level optimization and show how to train the controller with backpropagation-through-time.
arXiv Detail & Related papers (2022-12-05T20:35:36Z)
Optimization of Annealed Importance Sampling Hyperparameters [77.34726150561087]
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models. We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling. We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
arXiv Detail & Related papers (2022-09-27T07:58:25Z)
Variational Inference MPC using Normalizing Flows and Out-of-Distribution Projection [7.195824023358536]
We propose a Model Predictive Control (MPC) method for collision-free navigation. We learn a distribution that accounts for both the dynamics of the robot and complex obstacle geometries. We show that FlowMPPI with projection outperforms state-of-the-art MPC baselines on both in-distribution and OOD environments.
arXiv Detail & Related papers (2022-05-10T04:43:15Z)
ABC-LMPC: Safe Sample-Based Learning MPC for Stochastic Nonlinear Dynamical Systems with Adjustable Boundary Conditions [34.44010424789202]
We present a novel LMPC algorithm, Adjustable Boundary LMPC (ABC-LMPC), which enables rapid adaptation to novel start and goal configurations. We experimentally demonstrate that the resulting controller adapts to a variety of initial and terminal conditions on 3 continuous control tasks.
arXiv Detail & Related papers (2020-03-03T09:48:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.