Policy Mirror Descent for Reinforcement Learning: Linear Convergence,
New Sampling Complexity, and Generalized Problem Classes
- URL: http://arxiv.org/abs/2102.00135v2
- Date: Tue, 2 Feb 2021 01:59:41 GMT
- Title: Policy Mirror Descent for Reinforcement Learning: Linear Convergence,
New Sampling Complexity, and Generalized Problem Classes
- Authors: Guanghui Lan
- Abstract summary: We present new policy mirror descent (PMD) methods for solving reinforcement learning problems with strongly convex or general convex regularizers.
To the best of our knowledge, these developments appear to be new in both optimization and flexibility convexizers.
- Score: 6.240369435223
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present new policy mirror descent (PMD) methods for solving reinforcement
learning (RL) problems with either strongly convex or general convex
regularizers. By exploring the structural properties of these overall seemly
highly nonconvex problems we show that the PMD methods exhibit fast linear rate
of convergence to the global optimality. We develop stochastic counterparts of
these methods, and establish an ${\cal O}(1/\epsilon)$ (resp., ${\cal
O}(1/\epsilon^2)$) sampling complexity for solving these RL problems with
strongly (resp., general) convex regularizers using different sampling schemes,
where $\epsilon$ denote the target accuracy. We further show that the
complexity for computing the gradients of these regularizers, if necessary, can
be bounded by ${\cal O}\{(\log_\gamma \epsilon) [(1-\gamma)L/\mu]^{1/2}\log
(1/\epsilon)\}$ (resp., ${\cal O} \{(\log_\gamma \epsilon )
[(1-\gamma)L/\epsilon]^{1/2}\}$) for problems with strongly (resp., general)
convex regularizers. Here $\gamma$ denotes the discounting factor. To the best
of our knowledge, these complexity bounds, along with our algorithmic
developments, appear to be new in both optimization and RL literature. The
introduction of these convex regularizers also greatly expands the flexibility
and applicability of RL models.
Related papers
- Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - Polynomial-Time Solutions for ReLU Network Training: A Complexity
Classification via Max-Cut and Zonotopes [70.52097560486683]
We prove that the hardness of approximation of ReLU networks not only mirrors the complexity of the Max-Cut problem but also, in certain special cases, exactly corresponds to it.
In particular, when $epsilonleqsqrt84/83-1approx 0.006$, we show that it is NP-hard to find an approximate global dataset of the ReLU network objective with relative error $epsilon$ with respect to the objective value.
arXiv Detail & Related papers (2023-11-18T04:41:07Z) - The Sample Complexity Of ERMs In Stochastic Convex Optimization [13.896417716930687]
We show that in fact $tildeO(fracdepsilon+frac1epsilon2)$ data points are also sufficient.
We further generalize the result and show that a similar upper bound holds for all convex bodies.
arXiv Detail & Related papers (2023-11-09T14:29:25Z) - Reinforcement Learning with General Utilities: Simpler Variance
Reduction and Large State-Action Space [17.366915676628867]
We consider the reinforcement learning problem with general utilities.
Our algorithm achieves $tildemathcalO(epsilon-3)$ and $tildemathcalO(epsilon-2)$ sample complexities.
arXiv Detail & Related papers (2023-06-02T18:16:35Z) - Optimal Sample Complexity of Reinforcement Learning for Mixing
Discounted Markov Decision Processes [1.0445957451908694]
We consider the optimal sample complexity theory for maximizing the infinite horizon discounted reward in a Markov decision process (MDP)
In many applications of interest, the optimal policy (or all policies) induces mixing.
Our analysis is grounded in regeneration-type ideas, which we believe are of independent interest, as they can be used to study RL problems for general state space MDPs.
arXiv Detail & Related papers (2023-02-15T05:43:17Z) - A Newton-CG based barrier-augmented Lagrangian method for general nonconvex conic optimization [53.044526424637866]
In this paper we consider finding an approximate second-order stationary point (SOSP) that minimizes a twice different subject general non conic optimization.
In particular, we propose a Newton-CG based-augmentedconjugate method for finding an approximate SOSP.
arXiv Detail & Related papers (2023-01-10T20:43:29Z) - Optimal Gradient Sliding and its Application to Distributed Optimization
Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity.
We proposed a method to solve problems master to agents' communication and local calls.
The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z) - Thinking Outside the Ball: Optimal Learning with Gradient Descent for
Generalized Linear Stochastic Convex Optimization [37.177329562964765]
We consider linear prediction with a convex Lipschitz loss, or more generally, convex optimization problems of generalized linear form.
We show that in this setting, early iteration stopped Gradient Descent (GD), without any explicit regularization or projection, ensures excess error at most $epsilon$.
But instead of uniform convergence in a norm ball, which we show can guarantee suboptimal learning using $Theta (1/epsilon4)$ samples, we rely on uniform convergence in a distribution-dependent ball.
arXiv Detail & Related papers (2022-02-27T09:41:43Z) - Bilinear Classes: A Structural Framework for Provable Generalization in
RL [119.42509700822484]
Bilinear Classes is a new structural framework which permits generalization in reinforcement learning.
The framework incorporates nearly all existing models in which a sample complexity is achievable.
Our main result provides an RL algorithm which has sample complexity for Bilinear Classes.
arXiv Detail & Related papers (2021-03-19T16:34:20Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.