Related papers: The Vital Role of Gradient Clipping in Byzantine-Resilient Distributed Learning

The Vital Role of Gradient Clipping in Byzantine-Resilient Distributed Learning

URL: http://arxiv.org/abs/2405.14432v4
Date: Wed, 09 Oct 2024 16:04:01 GMT
Title: The Vital Role of Gradient Clipping in Byzantine-Resilient Distributed Learning
Authors: Youssef Allouah, Rachid Guerraoui, Nirupam Gupta, Ahmed Jellouli, Geovani Rizk, John Stephan,
Abstract summary: Byzantine-resilient distributed machine learning seeks to achieve robust learning performance in the presence of misbehaving or adversarial workers. While state-of-the-art (SOTA) robust distributed gradient descent (DGD) methods were proven theoretically optimal, their empirical success has often relied on pre-aggregation gradient clipping. We propose a principled adaptive clipping strategy, termed Adaptive Robust ClippingARC, to improve robustness against some attacks while being ineffective or detrimental against others.
Score: 8.268485501864939
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Byzantine-resilient distributed machine learning seeks to achieve robust learning performance in the presence of misbehaving or adversarial workers. While state-of-the-art (SOTA) robust distributed gradient descent (Robust-DGD) methods were proven theoretically optimal, their empirical success has often relied on pre-aggregation gradient clipping. However, the currently considered static clipping strategy exhibits mixed results: improving robustness against some attacks while being ineffective or detrimental against others. We address this gap by proposing a principled adaptive clipping strategy, termed Adaptive Robust Clipping (ARC). We show that ARC consistently enhances the empirical robustness of SOTA Robust-DGD methods, while preserving the theoretical robustness guarantees. Our analysis shows that ARC provably improves the asymptotic convergence guarantee of Robust-DGD in the case when the model is well-initialized. We validate this theoretical insight through an exhaustive set of experiments on benchmark image classification tasks. We observe that the improvement induced by ARC is more pronounced in highly heterogeneous and adversarial settings.

Related papers

Knockoff-Guided Compressive Sensing: A Statistical Machine Learning Framework for Support-Assured Signal Recovery [22.20955211690874]
This paper introduces a novel Knockoff-guided compressive sensing framework, referred to as TheName.<n>It enhances signal recovery by leveraging precise false discovery rate (FDR) control during the support identification phase.<n>In simulation studies, our method improves F1-score by up to 3.9x over baseline methods, attributed to principled false discovery rate (FDR) control and enhanced support recovery.
arXiv Detail & Related papers (2025-05-30T15:50:58Z)
Learning Difference-of-Convex Regularizers for Inverse Problems: A Flexible Framework with Theoretical Guarantees [0.6906005491572401]
Learning effective regularization is crucial for solving ill-posed inverse problems. In this paper, we show that a broader optimal non regularizers functions, difference-of-DC functions, can improve empirical performance.
arXiv Detail & Related papers (2025-02-01T00:40:24Z)
Enhancing Robust Fairness via Confusional Spectral Regularization [6.041034366572273]
We derive a robust generalization bound for the worst-class robust error within the PAC-Bayesian framework. We propose a novel regularization technique to improve worst-class robust accuracy and enhance robust fairness.
arXiv Detail & Related papers (2025-01-22T23:32:19Z)
On the Convergence of DP-SGD with Adaptive Clipping [56.24689348875711]
Gradient Descent with gradient clipping is a powerful technique for enabling differentially private optimization.<n>This paper provides the first comprehensive convergence analysis of SGD with quantile clipping (QC-SGD)<n>We show how QC-SGD suffers from a bias problem similar to constant-threshold clipped SGD but can be mitigated through a carefully designed quantile and step size schedule.
arXiv Detail & Related papers (2024-12-27T20:29:47Z)
Boosting Certificate Robustness for Time Series Classification with Efficient Self-Ensemble [10.63844868166531]
Randomized Smoothing has emerged as a standout method due to its ability to certify a provable lower bound on robustness radius under $ell_p$-ball attacks. We propose a self-ensemble method to enhance the lower bound of the probability confidence of predicted labels by reducing the variance of classification margins. This approach also addresses the computational overhead issue of Deep Ensemble(DE) while remaining competitive and, in some cases, outperforming it in terms of robustness.
arXiv Detail & Related papers (2024-09-04T15:22:08Z)
Perturbation-Invariant Adversarial Training for Neural Ranking Models: Improving the Effectiveness-Robustness Trade-Off [107.35833747750446]
adversarial examples can be crafted by adding imperceptible perturbations to legitimate documents. This vulnerability raises significant concerns about their reliability and hinders the widespread deployment of NRMs. In this study, we establish theoretical guarantees regarding the effectiveness-robustness trade-off in NRMs.
arXiv Detail & Related papers (2023-12-16T05:38:39Z)
Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach [62.000948039914135]
Using Differentially Private Gradient Descent with Gradient Clipping (DPSGD-GC) to ensure Differential Privacy (DP) comes at the cost of model performance degradation. We propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC. We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R'enyi DP.
arXiv Detail & Related papers (2023-11-24T17:56:44Z)
Learn from the Past: A Proxy Guided Adversarial Defense Framework with Self Distillation Regularization [53.04697800214848]
Adversarial Training (AT) is pivotal in fortifying the robustness of deep learning models. AT methods, relying on direct iterative updates for target model's defense, frequently encounter obstacles such as unstable training and catastrophic overfitting. We present a general proxy guided defense framework, LAST' (bf Learn from the Pbf ast)
arXiv Detail & Related papers (2023-10-19T13:13:41Z)
Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework. Our importance weights are obtained by optimizing the KL-divergence regularized loss function. Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z)
Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling. We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z)
Securing Distributed SGD against Gradient Leakage Threats [13.979995939926154]
This paper presents a holistic approach to gradient leakage resilient distributed gradient Descent (SGD) We analyze two types of strategies for privacy-enhanced federated learning: (i) gradient pruning with random selection or low-rank filtering and (ii) gradient perturbation with additive random noise or differential privacy noise. We present a gradient leakage resilient approach to securing distributed SGD in federated learning, with differential privacy controlled noise as the tool.
arXiv Detail & Related papers (2023-05-10T21:39:27Z)
Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs. We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z)
Stratified Learning: A General-Purpose Statistical Method for Improved Learning under Covariate Shift [1.1470070927586016]
We propose a simple, statistically principled, and theoretically justified method to improve supervised learning when the training set is not representative. We build upon a well-established methodology in causal inference, and show that the effects of covariate shift can be reduced or eliminated by conditioning on propensity scores. We demonstrate the effectiveness of our general-purpose method on two contemporary research questions in cosmology, outperforming state-of-the-art importance weighting methods.
arXiv Detail & Related papers (2021-06-21T15:53:20Z)
CROP: Certifying Robust Policies for Reinforcement Learning through Functional Smoothing [41.093241772796475]
We present the first framework of Certifying Robust Policies for reinforcement learning (CROP) against adversarial state perturbations. We propose two types of robustness certification criteria: robustness of per-state actions and lower bound of cumulative rewards.
arXiv Detail & Related papers (2021-06-17T07:58:32Z)
Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications. We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths. Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z)
Improved Analysis of Clipping Algorithms for Non-convex Optimization [19.507750439784605]
Recently, citetzhang 2019gradient show that clipped (stochastic) Gradient Descent (GD) converges faster than vanilla GD/SGD. Experiments confirm the superiority of clipping-based methods in deep learning tasks.
arXiv Detail & Related papers (2020-10-05T14:36:59Z)
Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function. It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.