Related papers: On Building Myopic MPC Policies using Supervised Learning

On Building Myopic MPC Policies using Supervised Learning

URL: http://arxiv.org/abs/2401.12546v2
Date: Fri, 9 Aug 2024 08:26:28 GMT
Title: On Building Myopic MPC Policies using Supervised Learning
Authors: Christopher A. Orrico, Bokan Yang, Dinesh Krishnamoorthy,
Abstract summary: This paper considers an alternative strategy, where supervised learning is used to learn the optimal value function offline instead of learning the optimal policy. This can then be used as the cost-to-go function in a myopic MPC with a very short prediction horizon.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The application of supervised learning techniques in combination with model predictive control (MPC) has recently generated significant interest, particularly in the area of approximate explicit MPC, where function approximators like deep neural networks are used to learn the MPC policy via optimal state-action pairs generated offline. While the aim of approximate explicit MPC is to closely replicate the MPC policy, substituting online optimization with a trained neural network, the performance guarantees that come with solving the online optimization problem are typically lost. This paper considers an alternative strategy, where supervised learning is used to learn the optimal value function offline instead of learning the optimal policy. This can then be used as the cost-to-go function in a myopic MPC with a very short prediction horizon, such that the online computation burden reduces significantly without affecting the controller performance. This approach differs from existing work on value function approximations in the sense that it learns the cost-to-go function by using offline-collected state-value pairs, rather than closed-loop performance data. The cost of generating the state-value pairs used for training is addressed using a sensitivity-based data augmentation scheme.

Related papers

Intersection of Reinforcement Learning and Bayesian Optimization for Intelligent Control of Industrial Processes: A Safe MPC-based DPG using Multi-Objective BO [0.0]
Model Predictive Control (MPC)-based Reinforcement Learning (RL) offers a structured and interpretable alternative to Deep Neural Network (DNN)-based RL methods.<n>Standard MPC-RL approaches often suffer from slow convergence, suboptimal policy learning due to limited parameterization, and safety issues during online adaptation.<n>We propose a novel framework that integrates MPC-RL with Multi-Objective Bayesian Optimization (MOBO)
arXiv Detail & Related papers (2025-07-14T02:31:52Z)
Accelerating RL for LLM Reasoning with Optimal Advantage Regression [52.0792918455501]
We propose a novel two-stage policy optimization framework that directly approximates the optimal advantage function.<n>$A$*-PO achieves competitive performance across a wide range of mathematical reasoning benchmarks.<n>It reduces training time by up to 2$times$ and peak memory usage by over 30% compared to PPO, GRPO, and REBEL.
arXiv Detail & Related papers (2025-05-27T03:58:50Z)
Bootstrapped Model Predictive Control [19.652808098339644]
We introduce Bootstrapped Model Predictive Control (BMPC), a novel algorithm that performs policy learning in a bootstrapped manner. BMPC learns a network policy by imitating an MPC expert, and in turn, uses this policy to guide the MPC process. Our method achieves superior performance over prior works on diverse continuous control tasks.
arXiv Detail & Related papers (2025-03-24T16:46:36Z)
Parameter-Adaptive Approximate MPC: Tuning Neural-Network Controllers without Retraining [50.00291020618743]
This work introduces a novel, parameter-adaptive AMPC architecture capable of online tuning without recomputing large datasets and retraining. We showcase the effectiveness of parameter-adaptive AMPC by controlling the swing-ups of two different real cartpole systems with a severely resource-constrained microcontroller (MCU) Taken together, these contributions represent a marked step toward the practical application of AMPC in real-world systems.
arXiv Detail & Related papers (2024-04-08T20:02:19Z)
An Automatic Tuning MPC with Application to Ecological Cruise Control [0.0]
We show an approach for online automatic tuning of an MPC controller with an example application to an ecological cruise control system. We solve the global fuel consumption minimization problem offline using dynamic programming and find the corresponding MPC cost function. A neural network fitted to these offline results is used to generate the desired MPC cost function weight during online operation.
arXiv Detail & Related papers (2023-09-17T19:49:47Z)
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs) MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z)
Model Predictive Control via On-Policy Imitation Learning [28.96122879515294]
We develop new sample complexity results and performance guarantees for data-driven Model Predictive Control. Our algorithm uses the structure of constrained linear MPC, and our analysis uses the properties of the explicit MPC solution to theoretically bound the number of online MPC trajectories needed to achieve optimal performance.
arXiv Detail & Related papers (2022-10-17T16:06:06Z)
Proximal Point Imitation Learning [48.50107891696562]
We develop new algorithms with rigorous efficiency guarantees for infinite horizon imitation learning. We leverage classical tools from optimization, in particular, the proximal-point method (PPM) and dual smoothing. We achieve convincing empirical performance for both linear and neural network function approximation.
arXiv Detail & Related papers (2022-09-22T12:40:21Z)
Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs) Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender. We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z)
Tailored neural networks for learning optimal value functions in MPC [0.0]
Learning-based predictive control is a promising alternative to optimization-based MPC. In this paper, we provide a similar result for representing the optimal value function and the Q-function that are both known to be piecewise quadratic for linear MPC.
arXiv Detail & Related papers (2021-12-07T20:34:38Z)
Curriculum Offline Imitation Learning [72.1015201041391]
offline reinforcement learning tasks require the agent to learn from a pre-collected dataset with no further interactions with the environment. We propose textitCurriculum Offline Learning (COIL), which utilizes an experience picking strategy for imitating from adaptive neighboring policies with a higher return. On continuous control benchmarks, we compare COIL against both imitation-based and RL-based methods, showing that it not only avoids just learning a mediocre behavior on mixed datasets but is also even competitive with state-of-the-art offline RL methods.
arXiv Detail & Related papers (2021-11-03T08:02:48Z)
Provably Correct Optimization and Exploration with Non-linear Policies [65.60853260886516]
ENIAC is an actor-critic method that allows non-linear function approximation in the critic. We show that under certain assumptions, the learner finds a near-optimal policy in $O(poly(d))$ exploration rounds. We empirically evaluate this adaptation and show that it outperforms priors inspired by linear methods.
arXiv Detail & Related papers (2021-03-22T03:16:33Z)
Blending MPC & Value Function Approximation for Efficient Reinforcement Learning [42.429730406277315]
Model-Predictive Control (MPC) is a powerful tool for controlling complex, real-world systems. We present a framework for improving on MPC with model-free reinforcement learning (RL) We show that our approach can obtain performance comparable with MPC with access to true dynamics.
arXiv Detail & Related papers (2020-12-10T11:32:01Z)
Adaptive Approximate Policy Iteration [22.915651391812187]
We present a learning scheme which enjoys a $tildeO(T2/3)$ regret bound for undiscounted, continuing learning in uniformly ergodic MDPs. This is an improvement over the best existing bound of $tildeO(T3/4)$ for the average-reward case with function approximation.
arXiv Detail & Related papers (2020-02-08T02:27:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.