A Two-Time-Scale Stochastic Optimization Framework with Applications in
Control and Reinforcement Learning
- URL: http://arxiv.org/abs/2109.14756v1
- Date: Wed, 29 Sep 2021 23:15:23 GMT
- Title: A Two-Time-Scale Stochastic Optimization Framework with Applications in
Control and Reinforcement Learning
- Authors: Sihan Zeng, Thinh T. Doan, Justin Romberg
- Abstract summary: We study a novel two-time-scale gradient method for solving problems where the samples are generated from a time-varying gradient.
We show that a convergence of $mathcal(k-2/3O)$ is achieved. This is the first time such a result is known at the literature.
- Score: 22.07834608976826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a novel two-time-scale stochastic gradient method for solving
optimization problems where the gradient samples are generated from a
time-varying Markov random process parameterized by the underlying optimization
variable. These time-varying samples make the stochastic gradient biased and
dependent, which can potentially lead to the divergence of the iterates. To
address this issue, we consider a two-time-scale update scheme, where one scale
is used to estimate the true gradient from the Markovian samples and the other
scale is used to update the decision variable with the estimated gradient.
While these two iterates are implemented simultaneously, the former is updated
"faster" (using bigger step sizes) than the latter (using smaller step sizes).
Our first contribution is to characterize the finite-time complexity of the
proposed two-time-scale stochastic gradient method. In particular, we provide
explicit formulas for the convergence rates of this method under different
objective functions, namely, strong convexity, convexity, non-convexity under
the PL condition, and general non-convexity.
Our second contribution is to apply our framework to study the performance of
the popular actor-critic methods in solving stochastic control and
reinforcement learning problems. First, we study an online natural actor-critic
algorithm for the linear-quadratic regulator and show that a convergence rate
of $\mathcal{O}(k^{-2/3})$ is achieved. This is the first time such a result is
known in the literature. Second, we look at the standard online actor-critic
algorithm over finite state and action spaces and derive a convergence rate of
$\mathcal{O}(k^{-2/5})$, which recovers the best known rate derived
specifically for this problem. Finally, we support our theoretical analysis
with numerical simulations where the convergence rate is visualized.
Related papers
- Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Formal guarantees for heuristic optimization algorithms used in machine
learning [6.978625807687497]
Gradient Descent (SGD) and its variants have become the dominant methods in the large-scale optimization machine learning (ML) problems.
We provide formal guarantees of a few convex optimization methods and proposing improved algorithms.
arXiv Detail & Related papers (2022-07-31T19:41:22Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic
Algorithm for Constrained Markov Decision Processes [22.07834608976826]
We study an online primal-dual actor-critic method to solve a discounted cost constrained Markov decision process problem.
This paper is the first to study the finite-time complexity of an online primal-dual actor-critic method for solving a CMDP problem.
arXiv Detail & Related papers (2021-10-21T18:05:40Z) - COCO Denoiser: Using Co-Coercivity for Variance Reduction in Stochastic
Convex Optimization [4.970364068620608]
We exploit convexity and L-smoothness to improve the noisy estimates outputted by the gradient oracle.
We show that increasing the number and proximity of the queried points leads to better gradient estimates.
We also apply COCO in vanilla settings by plugging it in existing algorithms, such as SGD, Adam or STRSAGA.
arXiv Detail & Related papers (2021-09-07T17:21:09Z) - High-probability Bounds for Non-Convex Stochastic Optimization with
Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails.
We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - A Unified Analysis of First-Order Methods for Smooth Games via Integral
Quadratic Constraints [10.578409461429626]
In this work, we adapt the integral quadratic constraints theory to first-order methods for smooth and strongly-varying games and iteration.
We provide emphfor the first time a global convergence rate for the negative momentum method(NM) with an complexity $mathcalO(kappa1.5)$, which matches its known lower bound.
We show that it is impossible for an algorithm with one step of memory to achieve acceleration if it only queries the gradient once per batch.
arXiv Detail & Related papers (2020-09-23T20:02:00Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z) - Convergence and sample complexity of gradient methods for the model-free
linear quadratic regulator problem [27.09339991866556]
We show that ODE searches for optimal control for an unknown computation system by directly searching over the corresponding space of controllers.
We take a step towards demystifying the performance and efficiency of such methods by focusing on the gradient-flow dynamics set of stabilizing feedback gains and a similar result holds for the forward disctization of the ODE.
arXiv Detail & Related papers (2019-12-26T16:56:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.