Related papers: Efficient Federated Learning against Byzantine Attacks and Data Heterogeneity via Aggregating Normalized Gradients

Efficient Federated Learning against Byzantine Attacks and Data Heterogeneity via Aggregating Normalized Gradients

URL: http://arxiv.org/abs/2408.09539v3
Date: Sat, 25 Oct 2025 10:05:17 GMT
Title: Efficient Federated Learning against Byzantine Attacks and Data Heterogeneity via Aggregating Normalized Gradients
Authors: Shiyuan Zuo, Xingrun Yan, Rongfei Fan, Li Shen, Puning Zhao, Jie Xu, Han Hu,
Abstract summary: Federated Learning (FL) enables clients to collaboratively train models without sharing raw data.<n>FL is vulnerable to Byzantine attacks and data heterogeneity iterations, which can severely degrade performance.<n>We propose effective Federated Normalized Gradients Algorithm (NGA)<n> Experimental results on benchmark convergence over existing methods.
Score: 27.433334322019675
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Federated Learning (FL) enables multiple clients to collaboratively train models without sharing raw data, but is vulnerable to Byzantine attacks and data heterogeneity, which can severely degrade performance. Existing Byzantine-robust approaches tackle data heterogeneity, but incur high computational overhead during gradient aggregation, thereby slowing down the training process. To address this issue, we propose a simple yet effective Federated Normalized Gradients Algorithm (Fed-NGA), which performs aggregation by merely computing the weighted mean of the normalized gradients from each client. This approach yields a favorable time complexity of $\mathcal{O}(pM)$, where $p$ is the model dimension and $M$ is the number of clients. We rigorously prove that Fed-NGA is robust to both Byzantine faults and data heterogeneity. For non-convex loss functions, Fed-NGA achieves convergence to a neighborhood of stationary points under general assumptions, and further attains zero optimality gap under some mild conditions, which is an outcome rarely achieved in existing literature. In both cases, the convergence rate is $\mathcal{O}(1/T^{\frac{1}{2} - \delta})$, where $T$ denotes the number of iterations and $\delta \in (0, 1/2)$. Experimental results on benchmark datasets confirm the superior time efficiency and convergence performance of Fed-NGA over existing methods.

Related papers

Closing the Approximation Gap of Partial AUC Optimization: A Tale of Two Formulations [121.39938773554523]
The Area Under the ROC Curve (AUC) is a pivotal evaluation metric in real-world scenarios with both class imbalance and decision constraints.<n>We present two simple instance-wise minimax reformulations to close the approximation gap of PAUC optimization.<n>The resulting algorithms enjoy a linear per-iteration computational complexity w.r.t. the sample size and a convergence rate of $O(-2/3)$ for typical one-way and two-way PAUCs.
arXiv Detail & Related papers (2025-12-01T02:52:33Z)
Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture. Non-smooth regularization is often incorporated into machine learning tasks. We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
From Continual Learning to SGD and Back: Better Rates for Continual Linear Models [50.11453013647086]
We analyze the forgetting, i.e., loss on previously seen tasks, after $k$ iterations.<n>We develop novel last-iterate upper bounds in the realizable least squares setup.<n>We prove for the first time that randomization alone, with no task repetition, can prevent catastrophic in sufficiently long task sequences.
arXiv Detail & Related papers (2025-04-06T18:39:45Z)
SAPPHIRE: Preconditioned Stochastic Variance Reduction for Faster Large-Scale Statistical Learning [18.055120576191204]
Ill-conditioned objectives and nonsmooth regularizers undermine the performance of traditional convex methods.<n>We propose a variance-free solution for ill-conditioned composite large-scale machine learning problems.
arXiv Detail & Related papers (2025-01-27T10:36:45Z)
Non-Convex Optimization in Federated Learning via Variance Reduction and Adaptive Learning [13.83895180419626]
This paper proposes a novel algorithm that leverages momentum-based variance reduction with adaptive learning to address non-epsilon settings across heterogeneous data. We aim to overcome challenges related to variance, hinders efficiency, and the slow convergence from learning rate adjustments with heterogeneous data.
arXiv Detail & Related papers (2024-12-16T11:02:38Z)
Federated Smoothing Proximal Gradient for Quantile Regression with Non-Convex Penalties [3.269165283595478]
Distributed sensors in the internet-of-things (IoT) generate vast amounts of sparse data. We propose a federated smoothing proximal gradient (G) algorithm that integrates a smoothing mechanism with the view, thereby both precision and computational speed.
arXiv Detail & Related papers (2024-08-10T21:50:19Z)
Byzantine-resilient Federated Learning With Adaptivity to Data Heterogeneity [54.145730036889496]
This paper deals with Gradient learning (FL) in the presence of malicious attacks Byzantine data. A novel Average Algorithm (RAGA) is proposed, which leverages robustness aggregation and can select a dataset.
arXiv Detail & Related papers (2024-03-20T08:15:08Z)
A Specialized Semismooth Newton Method for Kernel-Based Optimal Transport [92.96250725599958]
Kernel-based optimal transport (OT) estimators offer an alternative, functional estimation procedure to address OT problems from samples. We show that our SSN method achieves a global convergence rate of $O (1/sqrtk)$, and a local quadratic convergence rate under standard regularity conditions.
arXiv Detail & Related papers (2023-10-21T18:48:45Z)
CoLiDE: Concomitant Linear DAG Estimation [12.415463205960156]
We deal with the problem of learning acyclic graph structure from observational data to a linear equation. We propose a new convex score function for sparsity-aware learning DAGs.
arXiv Detail & Related papers (2023-10-04T15:32:27Z)
Achieving Linear Speedup in Non-IID Federated Bilevel Learning [16.56643290676128]
We propose a new federated bilevel algorithm named FedMBO. We show that FedMBO achieves a convergence rate of $mathcalObig(frac1sqrtnK+frac1K+fracsqrtnK3/2big)$ on non-i.i.d.datasets. This is the first theoretical linear speedup result for non-i.i.d.federated bilevel optimization.
arXiv Detail & Related papers (2023-02-10T18:28:00Z)
Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points. Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z)
Faster Adaptive Federated Learning [84.38913517122619]
Federated learning has attracted increasing attention with the emergence of distributed data. In this paper, we propose an efficient adaptive algorithm (i.e., FAFED) based on momentum-based variance reduced technique in cross-silo FL.
arXiv Detail & Related papers (2022-12-02T05:07:50Z)
Gradient-Free Methods for Deterministic and Stochastic Nonsmooth Nonconvex Optimization [94.19177623349947]
Non-smooth non optimization problems emerge in machine learning and business making. Two core challenges impede the development of efficient methods with finitetime convergence guarantee. Two-phase versions of GFM and SGFM are also proposed and proven to achieve improved large-deviation results.
arXiv Detail & Related papers (2022-09-12T06:53:24Z)
FEDNEST: Federated Bilevel, Minimax, and Compositional Optimization [53.78643974257301]
Many contemporary ML problems fall under nested bilevel programming that subsumes minimax and compositional optimization. We propose FedNest: A federated alternating gradient method to address general nested problems.
arXiv Detail & Related papers (2022-05-04T17:48:55Z)
Local Stochastic Bilevel Optimization with Momentum-Based Variance Reduction [104.41634756395545]
We study Federated Bilevel Optimization problems. Specifically, we first propose the FedBiO, a deterministic gradient-based algorithm. We show FedBiO has complexity of $O(epsilon-1.5)$. Our algorithms show superior performances compared to other baselines in numerical experiments.
arXiv Detail & Related papers (2022-05-03T16:40:22Z)
On the Convergence of Heterogeneous Federated Learning with Arbitrary Adaptive Online Model Pruning [15.300983585090794]
We present a unifying framework for heterogeneous FL algorithms with em arbitrary adaptive online model pruning. In particular, we prove that under certain sufficient conditions, these algorithms converge to a stationary point of standard FL for general smooth cost functions. We illuminate two key factors impacting convergence: pruning-induced noise and minimum coverage index.
arXiv Detail & Related papers (2022-01-27T20:43:38Z)
Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which. machines can independently compute gradients, and cooperate. Our algorithm is based on a new concentration technique, and its sample complexity. It is very practical: it improves upon the performance of all prior methods when no. setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z)
Byzantine-Resilient SGD in High Dimensions on Heterogeneous Data [10.965065178451104]
We study distributed gradient descent (SGD) in the master-worker architecture under Byzantine attacks. Our algorithm can tolerate up to $frac14$ fraction Byzantine workers.
arXiv Detail & Related papers (2020-05-16T04:15:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.