Byzantine Machine Learning Made Easy by Resilient Averaging of Momentums
- URL: http://arxiv.org/abs/2205.12173v1
- Date: Tue, 24 May 2022 16:14:50 GMT
- Title: Byzantine Machine Learning Made Easy by Resilient Averaging of Momentums
- Authors: Sadegh Farhadkhani, Rachid Guerraoui, Nirupam Gupta, Rafael Pinot,
John Stephan
- Abstract summary: Byzantine resilience emerged as a prominent topic within the distributed machine learning community.
We present emphRESAM (RESilient Averaging of Momentums), a unified framework that makes it simple to establish optimal Byzantine resilience.
- Score: 7.778461949427662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Byzantine resilience emerged as a prominent topic within the distributed
machine learning community. Essentially, the goal is to enhance distributed
optimization algorithms, such as distributed SGD, in a way that guarantees
convergence despite the presence of some misbehaving (a.k.a., {\em Byzantine})
workers. Although a myriad of techniques addressing the problem have been
proposed, the field arguably rests on fragile foundations. These techniques are
hard to prove correct and rely on assumptions that are (a) quite unrealistic,
i.e., often violated in practice, and (b) heterogeneous, i.e., making it
difficult to compare approaches.
We present \emph{RESAM (RESilient Averaging of Momentums)}, a unified
framework that makes it simple to establish optimal Byzantine resilience,
relying only on standard machine learning assumptions. Our framework is mainly
composed of two operators: \emph{resilient averaging} at the server and
\emph{distributed momentum} at the workers. We prove a general theorem stating
the convergence of distributed SGD under RESAM. Interestingly, demonstrating
and comparing the convergence of many existing techniques become direct
corollaries of our theorem, without resorting to stringent assumptions. We also
present an empirical evaluation of the practical relevance of RESAM.
Related papers
- f-FERM: A Scalable Framework for Robust Fair Empirical Risk Minimization [9.591164070876689]
This paper presents a unified optimization framework for fair empirical risk based on f-divergence measures (f-FERM)
In addition, our experiments demonstrate the superiority of fairness-accuracy tradeoffs offered by f-FERM for almost all batch sizes.
Our extension is based on a distributionally robust optimization reformulation of f-FERM objective under $L_p$ norms as uncertainty sets.
arXiv Detail & Related papers (2023-12-06T03:14:16Z) - Byzantine Robustness and Partial Participation Can Be Achieved at Once: Just Clip Gradient Differences [61.74021364776313]
Distributed learning has emerged as a leading paradigm for training large machine learning models.
In real-world scenarios, participants may be unreliable or malicious, posing a significant challenge to the integrity and accuracy of the trained models.
We propose the first distributed method with client sampling and provable tolerance to Byzantine workers.
arXiv Detail & Related papers (2023-11-23T17:50:30Z) - Prototype-based Aleatoric Uncertainty Quantification for Cross-modal
Retrieval [139.21955930418815]
Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space.
However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts.
We propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.
arXiv Detail & Related papers (2023-09-29T09:41:19Z) - Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework.
Our importance weights are obtained by optimizing the KL-divergence regularized loss function.
Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z) - Synergies between Disentanglement and Sparsity: Generalization and
Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations.
Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z) - Adversarial Vulnerability of Randomized Ensembles [12.082239973914326]
We show that randomized ensembles are more vulnerable to imperceptible adversarial perturbations than even standard AT models.
We propose a theoretically-sound and efficient adversarial attack algorithm (ARC) capable of compromising random ensembles even in cases where adaptive PGD fails to do so.
arXiv Detail & Related papers (2022-06-14T10:37:58Z) - On Kernelized Multi-Armed Bandits with Constraints [16.102401271318012]
We study a bandit problem with a general unknown reward function and a general unknown constraint function.
We propose a general framework for both algorithm performance analysis.
We demonstrate the superior performance of our proposed algorithms via numerical experiments.
arXiv Detail & Related papers (2022-03-29T14:02:03Z) - Combining Differential Privacy and Byzantine Resilience in Distributed
SGD [9.14589517827682]
This paper studies the extent to which the distributed SGD algorithm, in the standard parameter-server architecture, can learn an accurate model.
We show that many existing results on the convergence of distributed SGD under Byzantine faults, especially those relying on $(alpha,f)$-Byzantine resilience, are rendered invalid when honest workers enforce DP.
arXiv Detail & Related papers (2021-10-08T09:23:03Z) - Adversarial Robustness of Supervised Sparse Coding [34.94566482399662]
We consider a model that involves learning a representation while at the same time giving a precise generalization bound and a robustness certificate.
We focus on the hypothesis class obtained by combining a sparsity-promoting encoder coupled with a linear encoder.
We provide a robustness certificate for end-to-end classification.
arXiv Detail & Related papers (2020-10-22T22:05:21Z) - Learning Invariant Representations and Risks for Semi-supervised Domain
Adaptation [109.73983088432364]
We propose the first method that aims to simultaneously learn invariant representations and risks under the setting of semi-supervised domain adaptation (Semi-DA)
We introduce the LIRR algorithm for jointly textbfLearning textbfInvariant textbfRepresentations and textbfRisks.
arXiv Detail & Related papers (2020-10-09T15:42:35Z) - Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function.
It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.