Federated Stochastic Minimax Optimization under Heavy-Tailed Noises
- URL: http://arxiv.org/abs/2511.04456v1
- Date: Thu, 06 Nov 2025 15:27:29 GMT
- Title: Federated Stochastic Minimax Optimization under Heavy-Tailed Noises
- Authors: Xinwen Zhang, Hongchang Gao,
- Abstract summary: We propose two algorithms::-NSGDA, which integrates bounded gradients, and Mu-DA, for local updates.<n>Both algorithms are designed to effectively address heavy-tailed noise in minimax federated, under a milder condition.<n>To the best of our knowledge, these are the first minimax optimization algorithms with rigorous theoretical guarantees undertailed noise.
- Score: 23.850171320924574
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Heavy-tailed noise has attracted growing attention in nonconvex stochastic optimization, as numerous empirical studies suggest it offers a more realistic assumption than standard bounded variance assumption. In this work, we investigate nonconvex-PL minimax optimization under heavy-tailed gradient noise in federated learning. We propose two novel algorithms: Fed-NSGDA-M, which integrates normalized gradients, and FedMuon-DA, which leverages the Muon optimizer for local updates. Both algorithms are designed to effectively address heavy-tailed noise in federated minimax optimization, under a milder condition. We theoretically establish that both algorithms achieve a convergence rate of $O({1}/{(TNp)^{\frac{s-1}{2s}}})$. To the best of our knowledge, these are the first federated minimax optimization algorithms with rigorous theoretical guarantees under heavy-tailed noise. Extensive experiments further validate their effectiveness.
Related papers
- Second-order Optimization under Heavy-Tailed Noise: Hessian Clipping and Sample Complexity Limits [53.773695219320125]
We take a first step toward a theoretical understanding of second-order optimization under heavy-tailed noise.<n>We introduce a novel algorithm based on gradient and Hessian clipping, and prove high-probability upper bounds that nearly match the fundamental limits.
arXiv Detail & Related papers (2025-10-12T16:36:54Z) - Nonconvex Decentralized Stochastic Bilevel Optimization under Heavy-Tailed Noises [27.097405340259325]
Existing decentralized optimization methods assume the lower-level loss function is strongly convex and the gradient noise has finite variance.<n>This is the first decentralized bilevel algorithm with with rigorous theoretical guarantees under heavy-tailed noises.
arXiv Detail & Related papers (2025-09-19T02:51:19Z) - Adaptive Algorithms with Sharp Convergence Rates for Stochastic Hierarchical Optimization [31.032959636901086]
We propose novel adaptive algorithms for hierarchical optimization problems.<n>Our algorithms achieve sharp convergence rates without prior knowledge of the noise level.<n>Experiments on synthetic and deep learning tasks demonstrate the effectiveness of our proposed algorithms.
arXiv Detail & Related papers (2025-09-18T20:17:18Z) - Gradient Normalization Provably Benefits Nonconvex SGD under Heavy-Tailed Noise [60.92029979853314]
We investigate the roles of gradient normalization and clipping in ensuring the convergence of Gradient Descent (SGD) under heavy-tailed noise.
Our work provides the first theoretical evidence demonstrating the benefits of gradient normalization in SGD under heavy-tailed noise.
We introduce an accelerated SGD variant incorporating gradient normalization and clipping, further enhancing convergence rates under heavy-tailed noise.
arXiv Detail & Related papers (2024-10-21T22:40:42Z) - Distributed Extra-gradient with Optimal Complexity and Communication
Guarantees [60.571030754252824]
We consider monotone variational inequality (VI) problems in multi-GPU settings where multiple processors/workers/clients have access to local dual vectors.
Extra-gradient, which is a de facto algorithm for monotone VI problems, has not been designed to be communication-efficient.
We propose a quantized generalized extra-gradient (Q-GenX), which is an unbiased and adaptive compression method tailored to solve VIs.
arXiv Detail & Related papers (2023-08-17T21:15:04Z) - Can Decentralized Stochastic Minimax Optimization Algorithms Converge
Linearly for Finite-Sum Nonconvex-Nonconcave Problems? [56.62372517641597]
Decentralized minimax optimization has been actively studied in the past few years due to its application in a wide range machine learning.
This paper develops two novel decentralized minimax optimization algorithms for the non-strongly-nonconcave problem.
arXiv Detail & Related papers (2023-04-24T02:19:39Z) - Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems.
It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z) - Accelerating Noisy VQE Optimization with Gaussian Processes [0.0]
We introduce the use of Gaussian Processes (GP) as surrogate models to reduce the impact of noise.
ImFil is a state-of-the-art, gradient-free method, which in comparative studies has been shown to outperform on noisy VQE problems.
We show that when noise is present, the GP+ImFil approach finds results closer to the true global minimum in fewer evaluations than standalone ImFil.
arXiv Detail & Related papers (2022-04-15T05:01:14Z) - Computationally Efficient and Statistically Optimal Robust Low-rank
Matrix and Tensor Estimation [15.389011827844572]
Low-rank linear shrinkage estimation undertailed noise is challenging, both computationally statistically.
We introduce a novel sub-ient (RsGrad) which is not only computationally efficient but also statistically optimal.
arXiv Detail & Related papers (2022-03-02T09:05:15Z) - Optimizing Information-theoretical Generalization Bounds via Anisotropic
Noise in SGLD [73.55632827932101]
We optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD.
We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance.
arXiv Detail & Related papers (2021-10-26T15:02:27Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary.
Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.