Related papers: Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes

Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes

URL: http://arxiv.org/abs/2010.14763v2
Date: Sat, 27 Feb 2021 03:53:19 GMT
Title: Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes
Authors: Marten van Dijk, Nhuong V. Nguyen, Toan N. Nguyen, Lam M. Nguyen, Quoc Tran-Dinh and Phuong Ha Nguyen
Abstract summary: Hogwild! implements asynchronous Gradient Descent where multiple threads in parallel access a common repository containing training data. We show how local compute nodes can start choosing small mini-batch sizes which increase to larger ones in order to reduce communication cost.
Score: 26.9902939745173
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Hogwild! implements asynchronous Stochastic Gradient Descent (SGD) where multiple threads in parallel access a common repository containing training data, perform SGD iterations and update shared state that represents a jointly learned (global) model. We consider big data analysis where training data is distributed among local data sets in a heterogeneous way -- and we wish to move SGD computations to local compute nodes where local data resides. The results of these local SGD computations are aggregated by a central "aggregator" which mimics Hogwild!. We show how local compute nodes can start choosing small mini-batch sizes which increase to larger ones in order to reduce communication cost (round interaction with the aggregator). We improve state-of-the-art literature and show $O(\sqrt{K}$) communication rounds for heterogeneous data for strongly convex problems, where $K$ is the total number of gradient computations across all local compute nodes. For our scheme, we prove a \textit{tight} and novel non-trivial convergence analysis for strongly convex problems for {\em heterogeneous} data which does not use the bounded gradient assumption as seen in many existing publications. The tightness is a consequence of our proofs for lower and upper bounds of the convergence rate, which show a constant factor difference. We show experimental results for plain convex and non-convex problems for biased (i.e., heterogeneous) and unbiased local data sets.

Related papers

Locally Regularized Sparse Graph by Fast Proximal Gradient Descent [6.882546996728011]
We propose a novel Regularized Sparse Graph abbreviated SRSG. Sparse graphs have been shown to be effective in clustering high-dimensional data. We show that SRSG is superior to other clustering methods.
arXiv Detail & Related papers (2024-09-25T16:57:47Z)
Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator [49.87315310656657]
We introduce a new adaptive $k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at a sample to adaptively defining the neighborhood size. Results on many real-world datasets indicate that the new $kK$-NN algorithm yields superior balanced accuracy compared to the established $k$-NN method.
arXiv Detail & Related papers (2024-09-08T13:08:45Z)
The Effectiveness of Local Updates for Decentralized Learning under Data Heterogeneity [15.394956794959615]
We revisit two fundamental decentralized optimization methods, Decentralized Gradient Tracking (DGT) and Decentralized Gradient Descent (DGD) We prove that local DGT achieves communication complexity $tildemathcalO Big(fracLmu(K+1) + fracdelta + mumu (1 - rho) + fracrho (1 - rho)2 cdot fracL+ deltamuBig)$, %zhize
arXiv Detail & Related papers (2024-03-23T00:01:34Z)
FedGT: Federated Node Classification with Scalable Graph Transformer [27.50698154862779]
We propose a scalable textbfFederated textbfGraph textbfTransformer (textbfFedGT) in the paper. FedGT computes clients' similarity based on the aligned global nodes with optimal transport.
arXiv Detail & Related papers (2024-01-26T21:02:36Z)
Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning [112.69497636932955]
Federated learning aims to train models across different clients without the sharing of data for privacy considerations. We study how data heterogeneity affects the representations of the globally aggregated models. We propose sc FedDecorr, a novel method that can effectively mitigate dimensional collapse in federated learning.
arXiv Detail & Related papers (2022-10-01T09:04:17Z)
$\texttt{FedBC}$: Calibrating Global and Local Models via Federated Learning Beyond Consensus [66.62731854746856]
In federated learning (FL), the objective of collaboratively learning a global model through aggregation of model updates across devices tends to oppose the goal of personalization via local information. In this work, we calibrate this tradeoff in a quantitative manner through a multi-criterion-based optimization. We demonstrate that $texttFedBC$ balances the global and local model test accuracy metrics across a suite datasets.
arXiv Detail & Related papers (2022-06-22T02:42:04Z)
A Communication-efficient Algorithm with Linear Convergence for Federated Minimax Learning [1.713291434132985]
We study a large-scale multi-agent minimax optimization problem, which models Geneimation Adversarial Networks (GANs) The overall objective is a sum of agents' private local objective functions. We show that FedGDA-GT converges linearly with a constant stepsize to global $epsilon GDA solution.
arXiv Detail & Related papers (2022-06-02T16:31:16Z)
Federated Minimax Optimization: Improved Convergence Analyses and Algorithms [32.062312674333775]
We consider non minimax optimization, is gaining prominence many modern machine learning applications such as GANs. We provide a novel and tighter analysis algorithm, improves convergence communication guarantees in the existing literature.
arXiv Detail & Related papers (2022-03-09T16:21:31Z)
Escaping Saddle Points with Bias-Variance Reduced Local Perturbed SGD for Communication Efficient Nonconvex Distributed Learning [58.79085525115987]
Local methods are one of the promising approaches to reduce communication time. We show that the communication complexity is better than non-local methods when the local datasets is smaller than the smoothness local loss.
arXiv Detail & Related papers (2022-02-12T15:12:17Z)
Faster Convergence of Local SGD for Over-Parameterized Models [1.5504102675587357]
Modern machine learning architectures are often highly expressive. We analyze the convergence of Local SGD (or FedAvg) for such over-parameterized functions in heterogeneous data setting. For general convex loss functions, we establish an error bound $O(K/T)$ otherwise. For non-loss functions, we prove an error bound $O(K/T)$ in both cases. We complete our results by providing problem instances in which our established convergence rates are tight to a constant factor with a reasonably small stepsize.
arXiv Detail & Related papers (2022-01-30T04:05:56Z)
Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement. For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts. We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z)
Computationally efficient sparse clustering [67.95910835079825]
We provide a finite sample analysis of a new clustering algorithm based on PCA. We show that it achieves the minimax optimal misclustering rate in the regime $|theta infty$.
arXiv Detail & Related papers (2020-05-21T17:51:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.