A Two-Time-Scale Stochastic Optimization Framework with Applications in
Control and Reinforcement Learning
- URL: http://arxiv.org/abs/2109.14756v1
- Date: Wed, 29 Sep 2021 23:15:23 GMT
- Title: A Two-Time-Scale Stochastic Optimization Framework with Applications in
Control and Reinforcement Learning
- Authors: Sihan Zeng, Thinh T. Doan, Justin Romberg
- Abstract summary: We study a novel two-time-scale gradient method for solving problems where the samples are generated from a time-varying gradient.
We show that a convergence of $mathcal(k-2/3O)$ is achieved. This is the first time such a result is known at the literature.
- Score: 22.07834608976826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a novel two-time-scale stochastic gradient method for solving
optimization problems where the gradient samples are generated from a
time-varying Markov random process parameterized by the underlying optimization
variable. These time-varying samples make the stochastic gradient biased and
dependent, which can potentially lead to the divergence of the iterates. To
address this issue, we consider a two-time-scale update scheme, where one scale
is used to estimate the true gradient from the Markovian samples and the other
scale is used to update the decision variable with the estimated gradient.
While these two iterates are implemented simultaneously, the former is updated
"faster" (using bigger step sizes) than the latter (using smaller step sizes).
Our first contribution is to characterize the finite-time complexity of the
proposed two-time-scale stochastic gradient method. In particular, we provide
explicit formulas for the convergence rates of this method under different
objective functions, namely, strong convexity, convexity, non-convexity under
the PL condition, and general non-convexity.
Our second contribution is to apply our framework to study the performance of
the popular actor-critic methods in solving stochastic control and
reinforcement learning problems. First, we study an online natural actor-critic
algorithm for the linear-quadratic regulator and show that a convergence rate
of $\mathcal{O}(k^{-2/3})$ is achieved. This is the first time such a result is
known in the literature. Second, we look at the standard online actor-critic
algorithm over finite state and action spaces and derive a convergence rate of
$\mathcal{O}(k^{-2/5})$, which recovers the best known rate derived
specifically for this problem. Finally, we support our theoretical analysis
with numerical simulations where the convergence rate is visualized.
Related papers
- Fast Two-Time-Scale Stochastic Gradient Method with Applications in Reinforcement Learning [5.325297567945828]
We propose a new method for two-time-scale optimization that achieves significantly faster convergence than the prior arts.
We characterize the proposed algorithm under various conditions and show how it specializes on online sample-based methods.
arXiv Detail & Related papers (2024-05-15T19:03:08Z) - Stochastic Dimension-reduced Second-order Methods for Policy
Optimization [11.19708535159457]
We propose several new second-order algorithms for policy optimization that only require gradient and Hessian-vector product in each iteration.
Specifically, we propose a dimension-reduced second-order method (DR-SOPO) which repeatedly solves a projected two-dimensional trust region subproblem.
We show that DR-SOPO obtains an $mathcalO(epsilon-3.5)$ complexity for reaching approximate first-order stationary condition.
In addition, we present an enhanced algorithm (DVR-SOPO) which further improves the complexity to $mathcalO
arXiv Detail & Related papers (2023-01-28T12:09:58Z) - Formal guarantees for heuristic optimization algorithms used in machine
learning [6.978625807687497]
Gradient Descent (SGD) and its variants have become the dominant methods in the large-scale optimization machine learning (ML) problems.
We provide formal guarantees of a few convex optimization methods and proposing improved algorithms.
arXiv Detail & Related papers (2022-07-31T19:41:22Z) - Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic Algorithm for Constrained Markov Decision Processes [13.908826484332282]
We study an online primal-dual actor-critic method to solve a discounted cost constrained Markov decision process problem.
This paper is the first to study the finite-time complexity of an online primal-dual actor-critic method for solving a CMDP problem.
arXiv Detail & Related papers (2021-10-21T18:05:40Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - A Unified Analysis of First-Order Methods for Smooth Games via Integral
Quadratic Constraints [10.578409461429626]
In this work, we adapt the integral quadratic constraints theory to first-order methods for smooth and strongly-varying games and iteration.
We provide emphfor the first time a global convergence rate for the negative momentum method(NM) with an complexity $mathcalO(kappa1.5)$, which matches its known lower bound.
We show that it is impossible for an algorithm with one step of memory to achieve acceleration if it only queries the gradient once per batch.
arXiv Detail & Related papers (2020-09-23T20:02:00Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis
and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure.
We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem.
We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.