Related papers: GT-STORM: Taming Sample, Communication, and Memory Complexities in Decentralized Non-Convex Learning

GT-STORM: Taming Sample, Communication, and Memory Complexities in Decentralized Non-Convex Learning

URL: http://arxiv.org/abs/2105.01231v1
Date: Tue, 4 May 2021 00:44:48 GMT
Title: GT-STORM: Taming Sample, Communication, and Memory Complexities in Decentralized Non-Convex Learning
Authors: Xin Zhang, Jia Liu, Zhengyuan Zhu, and Elizabeth S. Bentley
Abstract summary: Decentralized-1/2 non robustness optimization has received increasing attention in recent years in machine learning. Three fundamental challenges in designing decentralized optimization algorithms are how to reduce their sample costs, communication, and memory complexities.
Score: 11.129095449532283
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Decentralized nonconvex optimization has received increasing attention in recent years in machine learning due to its advantages in system robustness, data privacy, and implementation simplicity. However, three fundamental challenges in designing decentralized optimization algorithms are how to reduce their sample, communication, and memory complexities. In this paper, we propose a \underline{g}radient-\underline{t}racking-based \underline{sto}chastic \underline{r}ecursive \underline{m}omentum (GT-STORM) algorithm for efficiently solving nonconvex optimization problems. We show that to reach an $\epsilon^2$-stationary solution, the total number of sample evaluations of our algorithm is $\tilde{O}(m^{1/2}\epsilon^{-3})$ and the number of communication rounds is $\tilde{O}(m^{-1/2}\epsilon^{-3})$, which improve the $O(\epsilon^{-4})$ costs of sample evaluations and communications for the existing decentralized stochastic gradient algorithms. We conduct extensive experiments with a variety of learning models, including non-convex logistical regression and convolutional neural networks, to verify our theoretical findings. Collectively, our results contribute to the state of the art of theories and algorithms for decentralized network optimization.

Related papers

Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture. Non-smooth regularization is often incorporated into machine learning tasks. We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency [52.60557300927007]
We present a $textbfMA-OSMA$ algorithm to transfer the discrete submodular problem into a continuous optimization. We also introduce a projection-free $textbfMA-OSEA$ algorithm, which effectively utilizes the KL divergence by mixing a uniform distribution. Our algorithms significantly improve the $(frac11+c)$-approximation provided by the state-of-the-art OSG algorithm.
arXiv Detail & Related papers (2025-02-07T15:57:56Z)
Faster Adaptive Decentralized Learning Algorithms [24.379734053137597]
We propose a class of faster distributed non decentralized algorithms (i.e. AdaMDOS and AdaMDOF) for adaptive learning and finite-sum optimization. Some experimental results demonstrate efficiency of our algorithms.
arXiv Detail & Related papers (2024-08-19T08:05:33Z)
Stochastic Constrained Decentralized Optimization for Machine Learning with Fewer Data Oracles: a Gradient Sliding Approach [32.36073823372713]
In machine-learning models, the algorithm has to communicate to the data center and sample data for its gradient. This gives rise to the need for a decentralized optimization algorithm that is communication-efficient and minimizes the number of gradient computations. We propose a primal-dual sliding with conditional gradient sliding framework, which is communication-efficient and achieves an $varepsilon$-approximate solution.
arXiv Detail & Related papers (2024-04-03T06:55:59Z)
Can Decentralized Stochastic Minimax Optimization Algorithms Converge Linearly for Finite-Sum Nonconvex-Nonconcave Problems? [56.62372517641597]
Decentralized minimax optimization has been actively studied in the past few years due to its application in a wide range machine learning. This paper develops two novel decentralized minimax optimization algorithms for the non-strongly-nonconcave problem.
arXiv Detail & Related papers (2023-04-24T02:19:39Z)
Robust Training and Verification of Implicit Neural Networks: A Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks. We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network. We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z)
INTERACT: Achieving Low Sample and Communication Complexities in Decentralized Bilevel Learning over Networks [24.02913189682224]
Decentralized bilevel optimization problems have received increasing attention in the networking machine learning communities. Low sample and communication complexities are two fundamental challenges that remain under-explored. Our work is the first that both low sample and communication complexities for solving decentralized bilevel optimization problems over networks.
arXiv Detail & Related papers (2022-07-27T04:19:28Z)
TCT: Convexifying Federated Learning using Bootstrapped Neural Tangent Kernels [141.29156234353133]
State-of-the-art convex learning methods can perform far worse than their centralized counterparts when clients have dissimilar data distributions. We show this disparity can largely be attributed to challenges presented by non-NISTity. We propose a Train-Convexify neural network (TCT) procedure to sidestep this issue.
arXiv Detail & Related papers (2022-07-13T16:58:22Z)
Decentralized Gossip-Based Stochastic Bilevel Optimization over Communication Networks [42.76623191830371]
We propose a gossip-based distributed bilevel optimization algorithm. Agents can solve both networked and outer problems in a single time. Our algorithm achieves the state-of-the-art efficiency and test accuracy.
arXiv Detail & Related papers (2022-06-22T06:38:54Z)
DASHA: Distributed Nonconvex Optimization with Communication Compression, Optimal Oracle Complexity, and No Client Synchronization [77.34726150561087]
We develop and analyze DASHA: a new family of methods for noneps distributed optimization problems. Unlike MARINA, the new methods DASHA, DASHA-MVR send compressed vectors only and never synchronize the nodes, which makes them more practical for learning.
arXiv Detail & Related papers (2022-02-02T20:10:40Z)
Decentralized Sparse Linear Regression via Gradient-Tracking: Linear Convergence and Statistical Guarantees [23.256961881716595]
We study a sparse linear regression over a network of agents, modeled as an undirected graph and no server node. We analyze the convergence rate and statistical guarantees of a distributed projected gradient tracking-based algorithm.
arXiv Detail & Related papers (2022-01-21T01:26:08Z)
Acceleration in Distributed Optimization Under Similarity [72.54787082152278]
We study distributed (strongly convex) optimization problems over a network of agents, with no centralized nodes. An $varepsilon$-solution is achieved in $tildemathcalrhoObig(sqrtfracbeta/mu (1-)log1/varepsilonbig)$ number of communications steps. This rate matches (up to poly-log factors) for the first time lower complexity communication bounds of distributed gossip-algorithms applied to the class of problems of interest.
arXiv Detail & Related papers (2021-10-24T04:03:00Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.