Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning
- URL: http://arxiv.org/abs/2008.10103v1
- Date: Sun, 23 Aug 2020 20:36:49 GMT
- Title: Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning
- Authors: Shuang Qiu, Zhuoran Yang, Xiaohan Wei, Jieping Ye, Zhaoran Wang
- Abstract summary: We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
- Score: 145.54544979467872
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporal-Difference (TD) learning with nonlinear smooth function
approximation for policy evaluation has achieved great success in modern
reinforcement learning. It is shown that such a problem can be reformulated as
a stochastic nonconvex-strongly-concave optimization problem, which is
challenging as naive stochastic gradient descent-ascent algorithm suffers from
slow convergence. Existing approaches for this problem are based on
two-timescale or double-loop stochastic gradient algorithms, which may also
require sampling large-batch data. However, in practice, a single-timescale
single-loop stochastic algorithm is preferred due to its simplicity and also
because its step-size is easier to tune. In this paper, we propose two
single-timescale single-loop algorithms which require only one data point each
step. Our first algorithm implements momentum updates on both primal and dual
variables achieving an $O(\varepsilon^{-4})$ sample complexity, which shows the
important role of momentum in obtaining a single-timescale algorithm. Our
second algorithm improves upon the first one by applying variance reduction on
top of momentum, which matches the best known $O(\varepsilon^{-3})$ sample
complexity in existing works. Furthermore, our variance-reduction algorithm
does not require a large-batch checkpoint. Moreover, our theoretical results
for both algorithms are expressed in a tighter form of simultaneous primal and
dual side convergence.
Related papers
- Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Ordering for Non-Replacement SGD [7.11967773739707]
We seek to find an ordering that can improve the convergence rates for the non-replacement form of the algorithm.
We develop optimal orderings for constant and decreasing step sizes for strongly convex and convex functions.
In addition, we are able to combine the ordering with mini-batch and further apply it to more complex neural networks.
arXiv Detail & Related papers (2023-06-28T00:46:58Z) - Computationally Efficient and Statistically Optimal Robust
High-Dimensional Linear Regression [15.389011827844572]
High-tailed linear regression under heavy-tailed noise or objective corruption is challenging, both computationally statistically.
In this paper, we introduce an algorithm for both the noise Gaussian or heavy 1 + epsilon regression problems.
arXiv Detail & Related papers (2023-05-10T14:31:03Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Finite-Time Error Bounds for Greedy-GQ [20.51105692499517]
We show that Greedy-GQ algorithm converges fast as finite-time error.
Our analysis provides for faster convergence step-sizes for choosing step-sizes.
arXiv Detail & Related papers (2022-09-06T15:04:57Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems.
MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z) - Distributed Stochastic Consensus Optimization with Momentum for
Nonconvex Nonsmooth Problems [45.88640334610308]
This paper presents a new distributed optimization algorithm for non-smooth problems.
We show that the proposed algorithm can achieve an overcal communication.
Experiments are presented to illustrate the effectiveness of the proposed algorithms.
arXiv Detail & Related papers (2020-11-10T13:12:21Z) - Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence
Analysis [27.679514676804057]
We develop a variance reduction scheme for the two time-scale TDC algorithm in the off-policy setting.
Experiments demonstrate that the proposed variance-reduced TDC achieves a smaller convergence error than both the conventional TDC and the variance-reduced TD.
arXiv Detail & Related papers (2020-10-26T01:33:05Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.