Related papers: Randomized Block-Coordinate Optimistic Gradient Algorithms for Root-Finding Problems

Randomized Block-Coordinate Optimistic Gradient Algorithms for Root-Finding Problems

URL: http://arxiv.org/abs/2301.03113v3
Date: Sat, 23 Sep 2023 05:46:22 GMT
Title: Randomized Block-Coordinate Optimistic Gradient Algorithms for Root-Finding Problems
Authors: Quoc Tran-Dinh
Abstract summary: We develop two new algorithms to approximate a solution of nonlinear equations in large-scale settings. We apply our methods to a class of large-scale finite-sum inclusions, which covers prominent applications in machine learning.
Score: 8.0153031008486
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we develop two new randomized block-coordinate optimistic gradient algorithms to approximate a solution of nonlinear equations in large-scale settings, which are called root-finding problems. Our first algorithm is non-accelerated with constant stepsizes, and achieves $\mathcal{O}(1/k)$ best-iterate convergence rate on $\mathbb{E}[ \Vert Gx^k\Vert^2]$ when the underlying operator $G$ is Lipschitz continuous and satisfies a weak Minty solution condition, where $\mathbb{E}[\cdot]$ is the expectation and $k$ is the iteration counter. Our second method is a new accelerated randomized block-coordinate optimistic gradient algorithm. We establish both $\mathcal{O}(1/k^2)$ and $o(1/k^2)$ last-iterate convergence rates on both $\mathbb{E}[ \Vert Gx^k\Vert^2]$ and $\mathbb{E}[ \Vert x^{k+1} - x^{k}\Vert^2]$ for this algorithm under the co-coerciveness of $G$. In addition, we prove that the iterate sequence $\{x^k\}$ converges to a solution almost surely, and $\Vert Gx^k\Vert^2$ attains a $o(1/k)$ almost sure convergence rate. Then, we apply our methods to a class of large-scale finite-sum inclusions, which covers prominent applications in machine learning, statistical learning, and network optimization, especially in federated learning. We obtain two new federated learning-type algorithms and their convergence rate guarantees for solving this problem class.

Related papers

Implicit High-Order Moment Tensor Estimation and Learning Latent Variable Models [39.33814194788341]
We develop a general efficient algorithm for em implicit moment tensor complexity. By leveraging our implicit moment estimation algorithm, we obtain the first $mathrmpoly(d, k, 1/epsilon)$-time learning algorithms for the following models.
arXiv Detail & Related papers (2024-11-23T23:13:24Z)
Variance-Reduced Fast Krasnoselkii-Mann Methods for Finite-Sum Root-Finding Problems [8.0153031008486]
We propose a new class of fast Krasnoselkii--Mann methods with variance reduction to solve a finite-sum co-coercive equation $Gx = 0$. Our algorithm is single-loop and leverages a new family of unbiased variance-reduced estimators specifically designed for a wider class of root-finding algorithms. numerical experiments validate our algorithms and demonstrate their promising performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-06-04T15:23:29Z)
Faster Linear Systems and Matrix Norm Approximation via Multi-level Sketched Preconditioning [10.690769339903941]
We present a new class of preconditioned iterative methods for solving linear systems of the form $Ax = b$. Our methods are based on constructing a low-rank Nystr"om approximation to $A$ using sparse random sketching. We prove that the convergence of our methods depends on a natural average condition number of $A$, which improves as the rank of the Nystr"om approximation.
arXiv Detail & Related papers (2024-05-09T15:53:43Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z)
An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting. Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function. We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z)
Memory-Constrained Algorithms for Convex Optimization via Recursive Cutting-Planes [23.94542304111204]
First class of algorithms that provides a positive trade-off between gradient descent and cutting-plane methods in any regime with $epsilonleq 1/sqrt d$. In the regime $epsilon leq d-Omega(d)$, our algorithm with $p=d$ achieves the information-theoretic optimal memory usage and improves the oracle-complexity of gradient descent.
arXiv Detail & Related papers (2023-06-16T17:00:51Z)
Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization [54.29685789885059]
We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem. The goal is to approximate $mathbfA$ as a product of low-rank factors. Our techniques generalize to other common variants of the BMF problem.
arXiv Detail & Related papers (2023-06-02T18:55:27Z)
Convergence of a Normal Map-based Prox-SGD Method under the KL Inequality [0.0]
We present a novel map-based algorithm ($mathsfnorMtext-mathsfSGD$) for $symbol$k$ convergence problems.
arXiv Detail & Related papers (2023-05-10T01:12:11Z)
An Optimal Stochastic Algorithm for Decentralized Nonconvex Finite-sum Optimization [25.21457349137344]
We show a proof to show DEAREST requires at most $mathcal O(+sqrtmnLvarepsilon-2)$ first-order oracle (IFO) calls and $mathcal O(Lvarepsilon-2/sqrt1-lambda_W)$ communication rounds.
arXiv Detail & Related papers (2022-10-25T11:37:11Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations. The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z)
Optimal Gradient Sliding and its Application to Distributed Optimization Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity. We proposed a method to solve problems master to agents' communication and local calls. The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z)
How Do You Want Your Greedy: Simultaneous or Repeated? [41.440943886672855]
We present SimultaneousGreedys, a deterministic algorithm for constrained submodular Julia. We also present SubmodularGreedy, a package which implements these algorithms.
arXiv Detail & Related papers (2020-09-29T13:34:09Z)
On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems [86.92205445270427]
We consider non-con minimax problems, $min_mathbfx max_mathhidoty f(mathbfdoty)$ efficiently.
arXiv Detail & Related papers (2019-06-02T03:03:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.