Related papers: Oracle Complexity Reduction for Model-free LQR: A Stochastic Variance-Reduced Policy Gradient Approach

Oracle Complexity Reduction for Model-free LQR: A Stochastic Variance-Reduced Policy Gradient Approach

URL: http://arxiv.org/abs/2309.10679v1
Date: Tue, 19 Sep 2023 15:03:18 GMT
Title: Oracle Complexity Reduction for Model-free LQR: A Stochastic Variance-Reduced Policy Gradient Approach
Authors: Leonardo F. Toso, Han Wang, James Anderson
Abstract summary: We investigate the problem of learning an $epsilon$-approximate solution for the discrete-time Linear Quadratic Regulator (LQR) problem. Our method combines both one-point and two-point estimations in a dual-loop variance-reduced algorithm.
Score: 4.422315636150272
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We investigate the problem of learning an $\epsilon$-approximate solution for the discrete-time Linear Quadratic Regulator (LQR) problem via a Stochastic Variance-Reduced Policy Gradient (SVRPG) approach. Whilst policy gradient methods have proven to converge linearly to the optimal solution of the model-free LQR problem, the substantial requirement for two-point cost queries in gradient estimations may be intractable, particularly in applications where obtaining cost function evaluations at two distinct control input configurations is exceptionally costly. To this end, we propose an oracle-efficient approach. Our method combines both one-point and two-point estimations in a dual-loop variance-reduced algorithm. It achieves an approximate optimal solution with only $O\left(\log\left(1/\epsilon\right)^{\beta}\right)$ two-point cost information for $\beta \in (0,1)$.

Related papers

First-order methods for stochastic and finite-sum convex optimization with deterministic constraints [1.411894456054802]
We study a class of and finite-sum convex optimization problems with deterministic constraints.<n>We propose first-order methods for finding an $epsilon$-$surely feasible optimal$ ($epsilon$-SFSO) solution.<n>As a byproduct, we also derive first-order oracle complexity results for sample average approximation method in computing an $epsilon$-SFSO solution.
arXiv Detail & Related papers (2025-06-25T17:26:02Z)
The Plug-in Approach for Average-Reward and Discounted MDPs: Optimal Sample Complexity Analysis [6.996002801232415]
We study the sample complexity of the plug-in approach for learning $varepsilon$-optimal policies in average-reward Markov decision processes. Despite representing arguably the simplest algorithm for this problem, the plug-in approach has never been theoretically analyzed.
arXiv Detail & Related papers (2024-10-10T05:08:14Z)
Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points. To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject. To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z)
Stochastic First-Order Methods with Non-smooth and Non-Euclidean Proximal Terms for Nonconvex High-Dimensional Stochastic Optimization [2.0657831823662574]
When the non problem is by which the non problem is by whichity, the sample of first-order methods may depend linearly on the problem dimension, is for undesirable problems. Our algorithms allow for the estimate of complexity using the distance of. mathO (log d) / EuM4. We prove that DISFOM can sharpen variance employing $mathO (log d) / EuM4.
arXiv Detail & Related papers (2024-06-27T18:38:42Z)
Sample Complexity of the Linear Quadratic Regulator: A Reinforcement Learning Lens [11.98212766542468]
We provide the first known algorithm that achieves $varepsilon$-optimality within $widetildemathcalO (1/varepsilon)$ function evaluations. Our results substantially improve upon the existing literature outside the realm of two-point gradient estimates.
arXiv Detail & Related papers (2024-04-16T18:54:57Z)
High-probability sample complexities for policy evaluation with linear function approximation [88.87036653258977]
We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms. We establish the first sample complexity bound with high-probability convergence guarantee that attains the optimal dependence on the tolerance level.
arXiv Detail & Related papers (2023-05-30T12:58:39Z)
Fully Stochastic Trust-Region Sequential Quadratic Programming for Equality-Constrained Optimization Problems [62.83783246648714]
We propose a sequential quadratic programming algorithm (TR-StoSQP) to solve nonlinear optimization problems with objectives and deterministic equality constraints. The algorithm adaptively selects the trust-region radius and, compared to the existing line-search StoSQP schemes, allows us to utilize indefinite Hessian matrices.
arXiv Detail & Related papers (2022-11-29T05:52:17Z)
Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum Minimization [52.25843977506935]
We propose an adaptive variance method, called AdaSpider, for $L$-smooth, non-reduction functions with a finitesum structure. In doing so, we are able to compute an $epsilon-stationary point with $tildeOleft + st/epsilon calls.
arXiv Detail & Related papers (2022-11-03T14:41:46Z)
Functional Constrained Optimization for Risk Aversion and Sparsity Control [7.561780884831967]
Risk and sparsity requirements need to be enforced simultaneously in many applications, e.g., in portfolio optimization, assortment planning, and radiation planning. We propose a Level Conditional Gradient (LCG) method, which generates a convex or sparse trajectory for these challenges. We show that the method achieves a level single-set projection of the optimal value an inner conditional approximation (CGO) for solving mini-max sub gradient.
arXiv Detail & Related papers (2022-10-11T02:51:51Z)
Gradient-Free Methods for Deterministic and Stochastic Nonsmooth Nonconvex Optimization [94.19177623349947]
Non-smooth non optimization problems emerge in machine learning and business making. Two core challenges impede the development of efficient methods with finitetime convergence guarantee. Two-phase versions of GFM and SGFM are also proposed and proven to achieve improved large-deviation results.
arXiv Detail & Related papers (2022-09-12T06:53:24Z)
Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model. The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z)
Nearly Optimal Linear Convergence of Stochastic Primal-Dual Methods for Linear Programming [5.126924253766052]
We show that the proposed method exhibits a linear convergence rate for solving sharp instances with a high probability. We also propose an efficient coordinate-based oracle for unconstrained bilinear problems.
arXiv Detail & Related papers (2021-11-10T04:56:38Z)
A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning [13.908826484332282]
We study a new two-time-scale gradient method for solving optimization problems. Our first contribution is to characterize the finite-time complexity of the proposed two-time-scale gradient algorithm. We apply our framework to gradient-based policy evaluation algorithms in reinforcement learning.
arXiv Detail & Related papers (2021-09-29T23:15:23Z)
A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure. We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem. We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.