Flag Aggregator: Scalable Distributed Training under Failures and
Augmented Losses using Convex Optimization
- URL: http://arxiv.org/abs/2302.05865v2
- Date: Mon, 25 Sep 2023 00:02:10 GMT
- Title: Flag Aggregator: Scalable Distributed Training under Failures and
Augmented Losses using Convex Optimization
- Authors: Hamidreza Almasi, Harsh Mishra, Balajee Vamanan, Sathya N. Ravi
- Abstract summary: ML applications increasingly rely on complex deep learning models and large datasets.
To scale computation and data, these models are inevitably trained in a distributed manner in clusters of nodes, and their updates are aggregated before being applied to the model.
With data augmentation added to these settings, there is a critical need for robust and efficient aggregation systems.
We show that our approach significantly enhances the robustness of state-of-the-art Byzantine resilient aggregators.
- Score: 14.732408788010313
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern ML applications increasingly rely on complex deep learning models and
large datasets. There has been an exponential growth in the amount of
computation needed to train the largest models. Therefore, to scale computation
and data, these models are inevitably trained in a distributed manner in
clusters of nodes, and their updates are aggregated before being applied to the
model. However, a distributed setup is prone to Byzantine failures of
individual nodes, components, and software. With data augmentation added to
these settings, there is a critical need for robust and efficient aggregation
systems. We define the quality of workers as reconstruction ratios $\in (0,1]$,
and formulate aggregation as a Maximum Likelihood Estimation procedure using
Beta densities. We show that the Regularized form of log-likelihood wrt
subspace can be approximately solved using iterative least squares solver, and
provide convergence guarantees using recent Convex Optimization landscape
results. Our empirical findings demonstrate that our approach significantly
enhances the robustness of state-of-the-art Byzantine resilient aggregators. We
evaluate our method in a distributed setup with a parameter server, and show
simultaneous improvements in communication efficiency and accuracy across
various tasks. The code is publicly available at
https://github.com/hamidralmasi/FlagAggregator
Related papers
- High-Dimensional Distributed Sparse Classification with Scalable Communication-Efficient Global Updates [50.406127962933915]
We develop solutions to problems which enable us to learn a communication-efficient distributed logistic regression model.
In our experiments we demonstrate a large improvement in accuracy over distributed algorithms with only a few distributed update steps needed.
arXiv Detail & Related papers (2024-07-08T19:34:39Z) - Distributed Collapsed Gibbs Sampler for Dirichlet Process Mixture Models
in Federated Learning [0.22499166814992444]
This paper proposes a new distributed Markov Chain Monte Carlo (MCMC) inference method for DPMMs (DisCGS) using sufficient statistics.
Our approach uses the collapsed Gibbs sampler and is specifically designed to work on distributed data across independent and heterogeneous machines.
For instance, with a dataset of 100K data points, the centralized algorithm requires approximately 12 hours to complete 100 iterations while our approach achieves the same number of iterations in just 3 minutes.
arXiv Detail & Related papers (2023-12-18T13:16:18Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Probabilistic partition of unity networks: clustering based deep
approximation [0.0]
Partition of unity networks (POU-Nets) have been shown capable of realizing algebraic convergence rates for regression and solution of PDEs.
We enrich POU-Nets with a Gaussian noise model to obtain a probabilistic generalization amenable to gradient-based generalizations of a maximum likelihood loss.
We provide benchmarks quantifying performance in high/low-dimensions, demonstrating that convergence rates depend only on the latent dimension of data within high-dimensional space.
arXiv Detail & Related papers (2021-07-07T08:02:00Z) - T-LoHo: A Bayesian Regularization Model for Structured Sparsity and
Smoothness on Graphs [0.0]
In graph-structured data, structured sparsity and smoothness tend to cluster together.
We propose a new prior for high dimensional parameters with graphical relations.
We use it to detect structured sparsity and smoothness simultaneously.
arXiv Detail & Related papers (2021-07-06T10:10:03Z) - Solving weakly supervised regression problem using low-rank manifold
regularization [77.34726150561087]
We solve a weakly supervised regression problem.
Under "weakly" we understand that for some training points the labels are known, for some unknown, and for others uncertain due to the presence of random noise or other reasons such as lack of resources.
In the numerical section, we applied the suggested method to artificial and real datasets using Monte-Carlo modeling.
arXiv Detail & Related papers (2021-04-13T23:21:01Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z) - Generalizing Variational Autoencoders with Hierarchical Empirical Bayes [6.273154057349038]
We present Hierarchical Empirical Bayes Autoencoder (HEBAE), a computationally stable framework for probabilistic generative models.
Our key contributions are two-fold. First, we make gains by placing a hierarchical prior over the encoding distribution, enabling us to adaptively balance the trade-off between minimizing the reconstruction loss function and avoiding over-regularization.
arXiv Detail & Related papers (2020-07-20T18:18:39Z) - Real-Time Regression with Dividing Local Gaussian Processes [62.01822866877782]
Local Gaussian processes are a novel, computationally efficient modeling approach based on Gaussian process regression.
Due to an iterative, data-driven division of the input space, they achieve a sublinear computational complexity in the total number of training points in practice.
A numerical evaluation on real-world data sets shows their advantages over other state-of-the-art methods in terms of accuracy as well as prediction and update speed.
arXiv Detail & Related papers (2020-06-16T18:43:31Z) - Beyond the Mean-Field: Structured Deep Gaussian Processes Improve the
Predictive Uncertainties [12.068153197381575]
We propose a novel variational family that allows for retaining covariances between latent processes while achieving fast convergence.
We provide an efficient implementation of our new approach and apply it to several benchmark datasets.
It yields excellent results and strikes a better balance between accuracy and calibrated uncertainty estimates than its state-of-the-art alternatives.
arXiv Detail & Related papers (2020-05-22T11:10:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.