MPCritic: A plug-and-play MPC architecture for reinforcement learning
- URL: http://arxiv.org/abs/2504.01086v1
- Date: Tue, 01 Apr 2025 18:07:07 GMT
- Title: MPCritic: A plug-and-play MPC architecture for reinforcement learning
- Authors: Nathan P. Lawrence, Thomas Banker, Ali Mesbah,
- Abstract summary: This paper presents MPCritic, a machine learning-friendly architecture that interfaces seamlessly with MPC tools.<n>MPCritic utilizes the loss landscape defined by a parameterized MPC problem, focusing on "soft" optimization over batched training steps.
- Score: 6.656737591902601
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The reinforcement learning (RL) and model predictive control (MPC) communities have developed vast ecosystems of theoretical approaches and computational tools for solving optimal control problems. Given their conceptual similarities but differing strengths, there has been increasing interest in synergizing RL and MPC. However, existing approaches tend to be limited for various reasons, including computational cost of MPC in an RL algorithm and software hurdles towards seamless integration of MPC and RL tools. These challenges often result in the use of "simple" MPC schemes or RL algorithms, neglecting the state-of-the-art in both areas. This paper presents MPCritic, a machine learning-friendly architecture that interfaces seamlessly with MPC tools. MPCritic utilizes the loss landscape defined by a parameterized MPC problem, focusing on "soft" optimization over batched training steps; thereby updating the MPC parameters while avoiding costly minimization and parametric sensitivities. Since the MPC structure is preserved during training, an MPC agent can be readily used for online deployment, where robust constraint satisfaction is paramount. We demonstrate the versatility of MPCritic, in terms of MPC architectures and RL algorithms that it can accommodate, on classic control benchmarks.
Related papers
- Parameter-Adaptive Approximate MPC: Tuning Neural-Network Controllers without Retraining [50.00291020618743]
This work introduces a novel, parameter-adaptive AMPC architecture capable of online tuning without recomputing large datasets and retraining.
We showcase the effectiveness of parameter-adaptive AMPC by controlling the swing-ups of two different real cartpole systems with a severely resource-constrained microcontroller (MCU)
Taken together, these contributions represent a marked step toward the practical application of AMPC in real-world systems.
arXiv Detail & Related papers (2024-04-08T20:02:19Z) - On Building Myopic MPC Policies using Supervised Learning [0.0]
This paper considers an alternative strategy, where supervised learning is used to learn the optimal value function offline instead of learning the optimal policy.
This can then be used as the cost-to-go function in a myopic MPC with a very short prediction horizon.
arXiv Detail & Related papers (2024-01-23T08:08:09Z) - CoVO-MPC: Theoretical Analysis of Sampling-based MPC and Optimal
Covariance Design [8.943418808959494]
We characterize the convergence property of a widely used sampling-based Model Predictive Path Integral Control (MPPI) method.
We show that MPPI enjoys at least linear convergence rates when the optimization is quadratic, which covers time-varying LQR systems.
Our theoretical analysis directly leads to a novel sampling-based MPC algorithm, CoVo-MPC.
Empirically, CoVo-MPC significantly outperforms standard MPPI by 43-54% in both simulations and real-world quad agile control tasks.
arXiv Detail & Related papers (2024-01-14T21:10:59Z) - Machine Learning Insides OptVerse AI Solver: Design Principles and
Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver.
We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem.
We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z) - Deep Model Predictive Optimization [21.22047409735362]
A major challenge in robotics is to design robust policies which enable complex and agile behaviors in the real world.
We propose Deep Model Predictive Optimization (DMPO), which learns the inner-loop of an MPC optimization algorithm directly via experience.
DMPO can outperform the best MPC algorithm by up to 27% with fewer samples and an end-to-end policy trained with MFRL by 19%.
arXiv Detail & Related papers (2023-10-06T21:11:52Z) - Learning-based MPC from Big Data Using Reinforcement Learning [1.3124513975412255]
This paper presents an approach for learning Model Predictive Control (MPC) schemes directly from data using Reinforcement Learning (RL) methods.
We propose to tackle this issue by using tools from RL to learn a parameterized MPC scheme directly from data in an offline fashion.
Our approach derives an MPC scheme without having to solve it over the collected dataset, thereby eliminating the computational complexity of existing techniques for big data.
arXiv Detail & Related papers (2023-01-04T15:39:34Z) - Optimization of the Model Predictive Control Meta-Parameters Through
Reinforcement Learning [1.4069478981641936]
We propose a novel framework in which any parameter of the control algorithm can be jointly tuned using reinforcement learning(RL)
We demonstrate our framework on the inverted pendulum control task, reducing the total time of the control system by 36% while also improving the control performance by 18.4% over the best-performing MPC baseline.
arXiv Detail & Related papers (2021-11-07T18:33:22Z) - Imitation Learning from MPC for Quadrupedal Multi-Gait Control [63.617157490920505]
We present a learning algorithm for training a single policy that imitates multiple gaits of a walking robot.
We use and extend MPC-Net, which is an Imitation Learning approach guided by Model Predictive Control.
We validate our approach on hardware and show that a single learned policy can replace its teacher to control multiple gaits.
arXiv Detail & Related papers (2021-03-26T08:48:53Z) - Reinforcement Learning for Adaptive Mesh Refinement [63.7867809197671]
We propose a novel formulation of AMR as a Markov decision process and apply deep reinforcement learning to train refinement policies directly from simulation.
The model sizes of these policy architectures are independent of the mesh size and hence scale to arbitrarily large and complex simulations.
arXiv Detail & Related papers (2021-03-01T22:55:48Z) - Covert Model Poisoning Against Federated Learning: Algorithm Design and
Optimization [76.51980153902774]
Federated learning (FL) is vulnerable to external attacks on FL models during parameters transmissions.
In this paper, we propose effective MP algorithms to combat state-of-the-art defensive aggregation mechanisms.
Our experimental results demonstrate that the proposed CMP algorithms are effective and substantially outperform existing attack mechanisms.
arXiv Detail & Related papers (2021-01-28T03:28:18Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.