A Moreau Envelope Approach for LQR Meta-Policy Estimation
- URL: http://arxiv.org/abs/2403.17364v2
- Date: Mon, 07 Oct 2024 05:04:42 GMT
- Title: A Moreau Envelope Approach for LQR Meta-Policy Estimation
- Authors: Ashwin Aravind, Mohammad Taha Toghani, César A. Uribe,
- Abstract summary: We study the problem of policy estimation for the Linear Quadratic Regulator (LQR) in discrete-time linear time-invariant uncertain dynamical systems.
We propose a surrogate LQR cost, built from a finite set of realizations of the uncertain system, to define a meta-policy efficiently adjustable to new realizations.
- Score: 0.7311194870168775
- License:
- Abstract: We study the problem of policy estimation for the Linear Quadratic Regulator (LQR) in discrete-time linear time-invariant uncertain dynamical systems. We propose a Moreau Envelope-based surrogate LQR cost, built from a finite set of realizations of the uncertain system, to define a meta-policy efficiently adjustable to new realizations. Moreover, we design an algorithm to find an approximate first-order stationary point of the meta-LQR cost function. Numerical results show that the proposed approach outperforms naive averaging of controllers on new realizations of the linear system. We also provide empirical evidence that our method has better sample complexity than Model-Agnostic Meta-Learning (MAML) approaches.
Related papers
- Meta-Learning Linear Quadratic Regulators: A Policy Gradient MAML Approach for Model-free LQR [4.787550557970832]
We characterize the stability and personalization guarantees of a policy gradient-based (PG) model-a meta-learning (MAML) approach for the LQR problem.
Our theoretical guarantees demonstrate that the learned controller can efficiently adapt to unseen LQR tasks.
arXiv Detail & Related papers (2024-01-25T21:59:52Z) - Let's reward step by step: Step-Level reward model as the Navigators for
Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase.
We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs.
To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z) - First-order Policy Optimization for Robust Markov Decision Process [40.2022466644885]
We consider the problem of solving robust Markov decision process (MDP)
MDP involves a set of discounted, finite state, finite action space MDPs with uncertain transition kernels.
For $(mathbfs,mathbfa)$-rectangular uncertainty sets, we establish several structural observations on the robust objective.
arXiv Detail & Related papers (2022-09-21T18:10:28Z) - Solving Multistage Stochastic Linear Programming via Regularized Linear
Decision Rules: An Application to Hydrothermal Dispatch Planning [77.34726150561087]
We propose a novel regularization scheme for linear decision rules (LDR) based on the AdaSO (adaptive least absolute shrinkage and selection operator)
Experiments show that the overfit threat is non-negligible when using the classical non-regularized LDR to solve MSLP.
For the LHDP problem, our analysis highlights the following benefits of the proposed framework in comparison to the non-regularized benchmark.
arXiv Detail & Related papers (2021-10-07T02:36:14Z) - Model-Augmented Q-learning [112.86795579978802]
We propose a MFRL framework that is augmented with the components of model-based RL.
Specifically, we propose to estimate not only the $Q$-values but also both the transition and the reward with a shared network.
We show that the proposed scheme, called Model-augmented $Q$-learning (MQL), obtains a policy-invariant solution which is identical to the solution obtained by learning with true reward.
arXiv Detail & Related papers (2021-02-07T17:56:50Z) - Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a
Finite Horizon [3.867363075280544]
We explore reinforcement learning methods for finding the optimal policy in the linear quadratic regulator (LQR) problem.
We produce a global linear convergence guarantee for the setting of finite time horizon and state dynamics under weak assumptions.
We show results for the case where we assume a model for the underlying dynamics and where we apply the method to the data directly.
arXiv Detail & Related papers (2020-11-20T09:51:49Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Learning with Safety Constraints: Sample Complexity of Reinforcement
Learning for Constrained MDPs [13.922754427601491]
We characterize the relationship between safety constraints and the number of samples needed to ensure a desired level of accuracy.
Our main finding is that compared to the best known bounds of the unconstrained regime, the sample of constrained RL algorithms are increased by a factor that is logarithmic in the number of constraints.
arXiv Detail & Related papers (2020-08-01T18:17:08Z) - Structured Policy Iteration for Linear Quadratic Regulator [40.52288246664592]
We introduce the textitStructured Policy Iteration (S-PI) for LQR, a method capable of deriving a structured linear policy.
Such a structured policy with (block) sparsity or low-rank can have significant advantages over the standard LQR policy.
In both the known-model and model-free setting, we prove convergence analysis under the proper choice of parameters.
arXiv Detail & Related papers (2020-07-13T06:03:15Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.