Sample Complexity of the Linear Quadratic Regulator: A Reinforcement Learning Lens
- URL: http://arxiv.org/abs/2404.10851v2
- Date: Thu, 18 Apr 2024 23:38:49 GMT
- Title: Sample Complexity of the Linear Quadratic Regulator: A Reinforcement Learning Lens
- Authors: Amirreza Neshaei Moghaddam, Alex Olshevsky, Bahman Gharesifard,
- Abstract summary: We provide the first known algorithm that achieves $varepsilon$-optimality within $widetildemathcalO (1/varepsilon)$ function evaluations.
Our results substantially improve upon the existing literature outside the realm of two-point gradient estimates.
- Score: 11.98212766542468
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We provide the first known algorithm that provably achieves $\varepsilon$-optimality within $\widetilde{\mathcal{O}}(1/\varepsilon)$ function evaluations for the discounted discrete-time LQR problem with unknown parameters, without relying on two-point gradient estimates. These estimates are known to be unrealistic in many settings, as they depend on using the exact same initialization, which is to be selected randomly, for two different policies. Our results substantially improve upon the existing literature outside the realm of two-point gradient estimates, which either leads to $\widetilde{\mathcal{O}}(1/\varepsilon^2)$ rates or heavily relies on stability assumptions.
Related papers
- Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity [50.25258834153574]
We focus on the class of (strongly) convex $(L0)$-smooth functions and derive new convergence guarantees for several existing methods.
In particular, we derive improved convergence rates for Gradient Descent with smoothnessed Gradient Clipping and for Gradient Descent with Polyak Stepsizes.
arXiv Detail & Related papers (2024-09-23T13:11:37Z) - Oracle Complexity Reduction for Model-free LQR: A Stochastic
Variance-Reduced Policy Gradient Approach [4.422315636150272]
We investigate the problem of learning an $epsilon$-approximate solution for the discrete-time Linear Quadratic Regulator (LQR) problem.
Our method combines both one-point and two-point estimations in a dual-loop variance-reduced algorithm.
arXiv Detail & Related papers (2023-09-19T15:03:18Z) - A Fully First-Order Method for Stochastic Bilevel Optimization [8.663726907303303]
We consider unconstrained bilevel optimization problems when only the first-order gradient oracles are available.
We propose a Fully First-order Approximation (F2SA) method, and study its non-asymptotic convergence properties.
We demonstrate even superior practical performance of the proposed method over existing second-order based approaches on MNIST data-hypercleaning experiments.
arXiv Detail & Related papers (2023-01-26T05:34:21Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - A gradient estimator via L1-randomization for online zero-order
optimization with two point feedback [93.57603470949266]
We present a novel gradient estimator based on two function evaluation and randomization.
We consider two types of assumptions on the noise of the zero-order oracle: canceling noise and adversarial noise.
We provide an anytime and completely data-driven algorithm, which is adaptive to all parameters of the problem.
arXiv Detail & Related papers (2022-05-27T11:23:57Z) - A Projection-free Algorithm for Constrained Stochastic Multi-level
Composition Optimization [12.096252285460814]
We propose a projection-free conditional gradient-type algorithm for composition optimization.
We show that the number of oracles and the linear-minimization oracle required by the proposed algorithm, are of order $mathcalO_T(epsilon-2)$ and $mathcalO_T(epsilon-3)$ respectively.
arXiv Detail & Related papers (2022-02-09T06:05:38Z) - High-probability Bounds for Non-Convex Stochastic Optimization with
Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails.
We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z) - Randomized Exploration for Reinforcement Learning with General Value
Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm.
Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises.
We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.