Exact Asymptotics for Linear Quadratic Adaptive Control
- URL: http://arxiv.org/abs/2011.01364v1
- Date: Mon, 2 Nov 2020 22:43:30 GMT
- Title: Exact Asymptotics for Linear Quadratic Adaptive Control
- Authors: Feicheng Wang and Lucas Janson
- Abstract summary: We study the simplest non-bandit reinforcement learning problem: linear quadratic control (LQAC)
We derive expressions for the regret, estimation error, and prediction error of a stepwise-updating LQAC algorithm.
In simulations on both stable and unstable systems, we find that our theory also describes the algorithm's finite-sample behavior remarkably well.
- Score: 6.287145010885044
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent progress in reinforcement learning has led to remarkable performance
in a range of applications, but its deployment in high-stakes settings remains
quite rare. One reason is a limited understanding of the behavior of
reinforcement algorithms, both in terms of their regret and their ability to
learn the underlying system dynamics---existing work is focused almost
exclusively on characterizing rates, with little attention paid to the
constants multiplying those rates that can be critically important in practice.
To start to address this challenge, we study perhaps the simplest non-bandit
reinforcement learning problem: linear quadratic adaptive control (LQAC). By
carefully combining recent finite-sample performance bounds for the LQAC
problem with a particular (less-recent) martingale central limit theorem, we
are able to derive asymptotically-exact expressions for the regret, estimation
error, and prediction error of a rate-optimal stepwise-updating LQAC algorithm.
In simulations on both stable and unstable systems, we find that our asymptotic
theory also describes the algorithm's finite-sample behavior remarkably well.
Related papers
- Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Rethinking SIGN Training: Provable Nonconvex Acceleration without First-
and Second-Order Gradient Lipschitz [66.22095739795068]
Sign-based methods have gained attention due to their ability to achieve robust performance despite only using only the sign information for parameter updates.
The current convergence analysis of sign-based methods relies on the strong assumptions of first-order acceleration and second-order acceleration.
In this paper we analyze their convergence under more realistic assumptions of first- and second-order acceleration.
arXiv Detail & Related papers (2023-10-23T06:48:43Z) - Sub-linear Regret in Adaptive Model Predictive Control [56.705978425244496]
We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online oracle that combines the certainty-equivalence principle and polytopic tubes.
We analyze the regret of the algorithm, when compared to an algorithm initially aware of the system dynamics.
arXiv Detail & Related papers (2023-10-07T15:07:10Z) - A Single-Loop Deep Actor-Critic Algorithm for Constrained Reinforcement Learning with Provable Convergence [7.586600116278698]
Deep Actor-Critic network (DNN) combine Actor-Critic network (DNN) and deep neural network (DNN)
Deep Actor-Critic network (DNN) combine Actor-Critic network (DNN) and deep neural network (DNN)
Deep Actor-Critic network (DNN) combine Actor-Critic network (DNN) and deep neural network (DNN)
Deep Actor-Critic network (DNN) combine Actor-Critic network (DNN) and deep neural network (DNN)
Deep Actor-Critic network (DNN)
arXiv Detail & Related papers (2023-06-10T10:04:54Z) - Instance-Optimality in Interactive Decision Making: Toward a
Non-Asymptotic Theory [30.061707627742766]
We aim for instance-optimality, a strong notion of adaptivity which asserts that, on any particular problem instance, the algorithm under consideration outperforms all consistent algorithms.
In this paper, we take the first step toward developing a non-asymptotic theory of instance-optimal decision making with general function approximation.
arXiv Detail & Related papers (2023-04-24T21:51:58Z) - Self-adaptive algorithms for quasiconvex programming and applications to
machine learning [0.0]
We provide a self-adaptive step-size strategy that does not include convex line-search techniques and a generic approach under mild assumptions.
The proposed method is verified by preliminary results from some computational examples.
To demonstrate the effectiveness of the proposed technique for large-scale problems, we apply it to some experiments on machine learning.
arXiv Detail & Related papers (2022-12-13T05:30:29Z) - Learning to Optimize with Stochastic Dominance Constraints [103.26714928625582]
In this paper, we develop a simple yet efficient approach for the problem of comparing uncertain quantities.
We recast inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability.
The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
arXiv Detail & Related papers (2022-11-14T21:54:31Z) - Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient
for Out-of-Distribution Generalization [52.7137956951533]
We argue that devising simpler methods for learning predictors on existing features is a promising direction for future research.
We introduce Domain-Adjusted Regression (DARE), a convex objective for learning a linear predictor that is provably robust under a new model of distribution shift.
Under a natural model, we prove that the DARE solution is the minimax-optimal predictor for a constrained set of test distributions.
arXiv Detail & Related papers (2022-02-14T16:42:16Z) - Instance-optimality in optimal value estimation: Adaptivity via
variance-reduced Q-learning [99.34907092347733]
We analyze the problem of estimating optimal $Q$-value functions for a discounted Markov decision process with discrete states and actions.
Using a local minimax framework, we show that this functional arises in lower bounds on the accuracy on any estimation procedure.
In the other direction, we establish the sharpness of our lower bounds, up to factors logarithmic in the state and action spaces, by analyzing a variance-reduced version of $Q$-learning.
arXiv Detail & Related papers (2021-06-28T00:38:54Z) - A Dynamic Penalty Function Approach for Constraints-Handling in
Reinforcement Learning [0.0]
This study focuses on usingReinforcement learning (RL) to solve constrained optimal control problems.
While training neural networks to learn the value (or Q) function, one can run into computational issues caused by the sharp change in the function value at the constraint boundary.
This difficulty during training can lead to convergence problems and ultimately lead to poor closed-loop performance.
arXiv Detail & Related papers (2020-12-22T02:13:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.