Robust Collaborative Learning with Linear Gradient Overhead
- URL: http://arxiv.org/abs/2209.10931v2
- Date: Sat, 3 Jun 2023 08:39:23 GMT
- Title: Robust Collaborative Learning with Linear Gradient Overhead
- Authors: Sadegh Farhadkhani, Rachid Guerraoui, Nirupam Gupta, L\^e Nguy\^en
Hoang, Rafael Pinot, John Stephan
- Abstract summary: Collaborative learning algorithms, such as distributed SGD (or D-SGD), are prone to faulty machines.
We present MoNNA, a new algorithm that is provably robust under standard assumptions.
We present a way to control the tension between the momentum and the model drifts.
- Score: 7.250306457887471
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Collaborative learning algorithms, such as distributed SGD (or D-SGD), are
prone to faulty machines that may deviate from their prescribed algorithm
because of software or hardware bugs, poisoned data or malicious behaviors.
While many solutions have been proposed to enhance the robustness of D-SGD to
such machines, previous works either resort to strong assumptions (trusted
server, homogeneous data, specific noise model) or impose a gradient
computational cost that is several orders of magnitude higher than that of
D-SGD. We present MoNNA, a new algorithm that (a) is provably robust under
standard assumptions and (b) has a gradient computation overhead that is linear
in the fraction of faulty machines, which is conjectured to be tight.
Essentially, MoNNA uses Polyak's momentum of local gradients for local updates
and nearest-neighbor averaging (NNA) for global mixing, respectively. While
MoNNA is rather simple to implement, its analysis has been more challenging and
relies on two key elements that may be of independent interest. Specifically,
we introduce the mixing criterion of $(\alpha, \lambda)$-reduction to analyze
the non-linear mixing of non-faulty machines, and present a way to control the
tension between the momentum and the model drifts. We validate our theory by
experiments on image classification and make our code available at
https://github.com/LPD-EPFL/robust-collaborative-learning.
Related papers
- Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture.
Non-smooth regularization is often incorporated into machine learning tasks.
We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z) - Robustness of deep learning classification to adversarial input on GPUs: asynchronous parallel accumulation is a source of vulnerability [6.403627167104689]
We present a new learnable permutation (LP) gradient-based approach to learn floating point operation orderings that lead to misclassifications.
This LP approach provides a worst-case estimate in a computationally efficient manner, avoiding the need to run identical experiments tens of thousands of times.
arXiv Detail & Related papers (2025-03-21T14:19:45Z) - Gradient-free variational learning with conditional mixture networks [39.827869318925494]
Conditional mixture networks (CMNs) are suitable for fast, gradient-free inference and can solve complex classification tasks.
We validate this approach by training two-layer CMNs on standard benchmarks from the UCI repository.
Our method, CAVI-CMN, achieves competitive and often superior predictive accuracy compared to maximum likelihood estimation (MLE) with backpropagation.
arXiv Detail & Related papers (2024-08-29T10:43:55Z) - chemtrain: Learning Deep Potential Models via Automatic Differentiation and Statistical Physics [0.0]
Neural Networks (NNs) are promising models for refining the accuracy of molecular dynamics.
Chemtrain is a framework to learn sophisticated NN potential models through customizable training routines and advanced training algorithms.
arXiv Detail & Related papers (2024-08-28T15:14:58Z) - Interfacing Finite Elements with Deep Neural Operators for Fast
Multiscale Modeling of Mechanics Problems [4.280301926296439]
In this work, we explore the idea of multiscale modeling with machine learning and employ DeepONet, a neural operator, as an efficient surrogate of the expensive solver.
DeepONet is trained offline using data acquired from the fine solver for learning the underlying and possibly unknown fine-scale dynamics.
We present various benchmarks to assess accuracy and speedup, and in particular we develop a coupling algorithm for a time-dependent problem.
arXiv Detail & Related papers (2022-02-25T20:46:08Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Fast Distributionally Robust Learning with Variance Reduced Min-Max
Optimization [85.84019017587477]
Distributionally robust supervised learning is emerging as a key paradigm for building reliable machine learning systems for real-world applications.
Existing algorithms for solving Wasserstein DRSL involve solving complex subproblems or fail to make use of gradients.
We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable extra-gradient algorithms.
arXiv Detail & Related papers (2021-04-27T16:56:09Z) - Online Limited Memory Neural-Linear Bandits with Likelihood Matching [53.18698496031658]
We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.
We propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.
arXiv Detail & Related papers (2021-02-07T14:19:07Z) - Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models.
We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs.
Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z) - Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge
Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles.
Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center.
We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes.
A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z) - Compressive MR Fingerprinting reconstruction with Neural Proximal
Gradient iterations [27.259916894535404]
ProxNet is a learned proximal gradient descent framework that incorporates the forward acquisition and Bloch dynamic models within a recurrent learning mechanism.
Our numerical experiments show that the ProxNet can achieve a superior quantitative inference accuracy, much smaller storage requirement, and a comparable runtime to the recent deep learning MRF baselines.
arXiv Detail & Related papers (2020-06-27T03:52:22Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.