Learning Zero-Sum Linear Quadratic Games with Improved Sample Complexity
and Last-Iterate Convergence
- URL: http://arxiv.org/abs/2309.04272v3
- Date: Tue, 31 Oct 2023 16:34:09 GMT
- Title: Learning Zero-Sum Linear Quadratic Games with Improved Sample Complexity
and Last-Iterate Convergence
- Authors: Jiduan Wu and Anas Barakat and Ilyas Fatkhullin and Niao He
- Abstract summary: Zero-sum Linear Quadratic (LQ) games are fundamental in optimal control.
In this work, we propose a simpler nested Zeroth-Order (NPG) algorithm.
- Score: 19.779044926914704
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Zero-sum Linear Quadratic (LQ) games are fundamental in optimal control and
can be used (i)~as a dynamic game formulation for risk-sensitive or robust
control and (ii)~as a benchmark setting for multi-agent reinforcement learning
with two competing agents in continuous state-control spaces. In contrast to
the well-studied single-agent linear quadratic regulator problem, zero-sum LQ
games entail solving a challenging nonconvex-nonconcave min-max problem with an
objective function that lacks coercivity. Recently, Zhang et al. showed that
an~$\epsilon$-Nash equilibrium (NE) of finite horizon zero-sum LQ games can be
learned via nested model-free Natural Policy Gradient (NPG) algorithms with
poly$(1/\epsilon)$ sample complexity. In this work, we propose a simpler nested
Zeroth-Order (ZO) algorithm improving sample complexity by several orders of
magnitude and guaranteeing convergence of the last iterate. Our main results
are two-fold: (i) in the deterministic setting, we establish the first global
last-iterate linear convergence result for the nested algorithm that seeks NE
of zero-sum LQ games; (ii) in the model-free setting, we establish
a~$\widetilde{\mathcal{O}}(\epsilon^{-2})$ sample complexity using a
single-point ZO estimator. For our last-iterate convergence results, our
analysis leverages the Implicit Regularization (IR) property and a new gradient
domination condition for the primal function. Our key improvements in the
sample complexity rely on a more sample-efficient nested algorithm design and a
finer control of the ZO natural gradient estimation error utilizing the
structure endowed by the finite-horizon setting.
Related papers
- Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms [65.42376001308064]
We propose two variance reduced ZO estimators for complex gradient problems.
We improve the state-of-the-art function complexities from $mathcalOleft(minfracdn1/2epsilon2, fracdepsilon3right)$ to $tildecalOleft(fracdepsilon2right)$.
arXiv Detail & Related papers (2024-10-03T15:04:01Z) - Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Finite-Time Error Bounds for Greedy-GQ [20.51105692499517]
We show that Greedy-GQ algorithm converges fast as finite-time error.
Our analysis provides for faster convergence step-sizes for choosing step-sizes.
arXiv Detail & Related papers (2022-09-06T15:04:57Z) - Global Convergence of Two-timescale Actor-Critic for Solving Linear
Quadratic Regulator [43.13238243240668]
We develop a new analysis framework that allows establishing the global convergence to an $epsilon$-optimal solution.
This is the first finite-time convergence analysis for the single sample two-timescale AC for solving LQR with global optimality.
arXiv Detail & Related papers (2022-08-18T09:57:03Z) - Multi-block-Single-probe Variance Reduced Estimator for Coupled
Compositional Optimization [49.58290066287418]
We propose a novel method named Multi-block-probe Variance Reduced (MSVR) to alleviate the complexity of compositional problems.
Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on strongity.
arXiv Detail & Related papers (2022-07-18T12:03:26Z) - Efficient Methods for Structured Nonconvex-Nonconcave Min-Max
Optimization [98.0595480384208]
We propose a generalization extraient spaces which converges to a stationary point.
The algorithm applies not only to general $p$-normed spaces, but also to general $p$-dimensional vector spaces.
arXiv Detail & Related papers (2020-10-31T21:35:42Z) - A Feasible Level Proximal Point Method for Nonconvex Sparse Constrained
Optimization [25.73397307080647]
We present a new model of a general convex or non objective machine machine objectives.
We propose an algorithm that solves a constraint with gradually relaxed point levels of each subproblem.
We demonstrate the effectiveness of our new numerical scale problems.
arXiv Detail & Related papers (2020-10-23T05:24:05Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Provably Convergent Working Set Algorithm for Non-Convex Regularized
Regression [0.0]
This paper proposes a working set algorithm for non-regular regularizers with convergence guarantees.
Our results demonstrate high gain compared to the full problem solver for both block-coordinates or a gradient solver.
arXiv Detail & Related papers (2020-06-24T07:40:31Z) - SGD with shuffling: optimal rates without component convexity and large
epoch requirements [60.65928290219793]
We consider the RandomShuffle (shuffle at the beginning of each epoch) and SingleShuffle (shuffle only once)
We establish minimax optimal convergence rates of these algorithms up to poly-log factor gaps.
We further sharpen the tight convergence results for RandomShuffle by removing the drawbacks common to all prior arts.
arXiv Detail & Related papers (2020-06-12T05:00:44Z) - Zeroth-Order Algorithms for Nonconvex Minimax Problems with Improved
Complexities [21.13934071954103]
We present a deterministic algorithm for non-in one-text variable Descent strongly-concave in the other.
We show that under the SGC assumption, the complexities of the algorithms match that of existing algorithms.
Results are presented in terms of oracle-texttZO-GDMSA and Numerical experiments are presented to support theoretical results.
arXiv Detail & Related papers (2020-01-22T00:05:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.