Randomized Block-Coordinate Optimistic Gradient Algorithms for
Root-Finding Problems
- URL: http://arxiv.org/abs/2301.03113v3
- Date: Sat, 23 Sep 2023 05:46:22 GMT
- Title: Randomized Block-Coordinate Optimistic Gradient Algorithms for
Root-Finding Problems
- Authors: Quoc Tran-Dinh
- Abstract summary: We develop two new algorithms to approximate a solution of nonlinear equations in large-scale settings.
We apply our methods to a class of large-scale finite-sum inclusions, which covers prominent applications in machine learning.
- Score: 8.0153031008486
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we develop two new randomized block-coordinate optimistic
gradient algorithms to approximate a solution of nonlinear equations in
large-scale settings, which are called root-finding problems. Our first
algorithm is non-accelerated with constant stepsizes, and achieves
$\mathcal{O}(1/k)$ best-iterate convergence rate on $\mathbb{E}[ \Vert
Gx^k\Vert^2]$ when the underlying operator $G$ is Lipschitz continuous and
satisfies a weak Minty solution condition, where $\mathbb{E}[\cdot]$ is the
expectation and $k$ is the iteration counter. Our second method is a new
accelerated randomized block-coordinate optimistic gradient algorithm. We
establish both $\mathcal{O}(1/k^2)$ and $o(1/k^2)$ last-iterate convergence
rates on both $\mathbb{E}[ \Vert Gx^k\Vert^2]$ and $\mathbb{E}[ \Vert x^{k+1} -
x^{k}\Vert^2]$ for this algorithm under the co-coerciveness of $G$. In
addition, we prove that the iterate sequence $\{x^k\}$ converges to a solution
almost surely, and $\Vert Gx^k\Vert^2$ attains a $o(1/k)$ almost sure
convergence rate. Then, we apply our methods to a class of large-scale
finite-sum inclusions, which covers prominent applications in machine learning,
statistical learning, and network optimization, especially in federated
learning. We obtain two new federated learning-type algorithms and their
convergence rate guarantees for solving this problem class.
Related papers
- Faster Linear Systems and Matrix Norm Approximation via Multi-level Sketched Preconditioning [10.690769339903941]
We present a new class of preconditioned iterative methods for solving linear systems of the form $Ax = b$.
Our methods are based on constructing a low-rank Nystr"om approximation to $A$ using sparse random sketching.
We prove that the convergence of our methods depends on a natural average condition number of $A$, which improves as the rank of the Nystr"om approximation.
arXiv Detail & Related papers (2024-05-09T15:53:43Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - Memory-Constrained Algorithms for Convex Optimization via Recursive
Cutting-Planes [23.94542304111204]
First class of algorithms that provides a positive trade-off between gradient descent and cutting-plane methods in any regime with $epsilonleq 1/sqrt d$.
In the regime $epsilon leq d-Omega(d)$, our algorithm with $p=d$ achieves the information-theoretic optimal memory usage and improves the oracle-complexity of gradient descent.
arXiv Detail & Related papers (2023-06-16T17:00:51Z) - Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix
Factorization [54.29685789885059]
We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem.
The goal is to approximate $mathbfA$ as a product of low-rank factors.
Our techniques generalize to other common variants of the BMF problem.
arXiv Detail & Related papers (2023-06-02T18:55:27Z) - Convergence of a Normal Map-based Prox-SGD Method under the KL
Inequality [0.0]
We present a novel map-based algorithm ($mathsfnorMtext-mathsfSGD$) for $symbol$k$ convergence problems.
arXiv Detail & Related papers (2023-05-10T01:12:11Z) - An Optimal Stochastic Algorithm for Decentralized Nonconvex Finite-sum
Optimization [25.21457349137344]
We show a proof to show DEAREST requires at most $mathcal O(+sqrtmnLvarepsilon-2)$ first-order oracle (IFO) calls and $mathcal O(Lvarepsilon-2/sqrt1-lambda_W)$ communication rounds.
arXiv Detail & Related papers (2022-10-25T11:37:11Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Optimal Gradient Sliding and its Application to Distributed Optimization
Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity.
We proposed a method to solve problems master to agents' communication and local calls.
The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z) - On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems [86.92205445270427]
We consider non-con minimax problems, $min_mathbfx max_mathhidoty f(mathbfdoty)$ efficiently.
arXiv Detail & Related papers (2019-06-02T03:03:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.