Nonconvex Decentralized Stochastic Bilevel Optimization under Heavy-Tailed Noises
- URL: http://arxiv.org/abs/2509.15543v1
- Date: Fri, 19 Sep 2025 02:51:19 GMT
- Title: Nonconvex Decentralized Stochastic Bilevel Optimization under Heavy-Tailed Noises
- Authors: Xinwen Zhang, Yihan Zhang, Hongchang Gao,
- Abstract summary: Existing decentralized optimization methods assume the lower-level loss function is strongly convex and the gradient noise has finite variance.<n>This is the first decentralized bilevel algorithm with with rigorous theoretical guarantees under heavy-tailed noises.
- Score: 27.097405340259325
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing decentralized stochastic optimization methods assume the lower-level loss function is strongly convex and the stochastic gradient noise has finite variance. These strong assumptions typically are not satisfied in real-world machine learning models. To address these limitations, we develop a novel decentralized stochastic bilevel optimization algorithm for the nonconvex bilevel optimization problem under heavy-tailed noises. Specifically, we develop a normalized stochastic variance-reduced bilevel gradient descent algorithm, which does not rely on any clipping operation. Moreover, we establish its convergence rate by innovatively bounding interdependent gradient sequences under heavy-tailed noises for nonconvex decentralized bilevel optimization problems. As far as we know, this is the first decentralized bilevel optimization algorithm with rigorous theoretical guarantees under heavy-tailed noises. The extensive experimental results confirm the effectiveness of our algorithm in handling heavy-tailed noises.
Related papers
- Stability and Generalization of Nonconvex Optimization with Heavy-Tailed Noise [24.27538723361077]
We develop a general framework for establishing bounds under heavy-tailed noise.<n>We provide the stability and generalization analysis for several popular algorithms under heavy-tailed noise.
arXiv Detail & Related papers (2026-01-27T15:50:57Z) - Federated Stochastic Minimax Optimization under Heavy-Tailed Noises [23.850171320924574]
We propose two algorithms::-NSGDA, which integrates bounded gradients, and Mu-DA, for local updates.<n>Both algorithms are designed to effectively address heavy-tailed noise in minimax federated, under a milder condition.<n>To the best of our knowledge, these are the first minimax optimization algorithms with rigorous theoretical guarantees undertailed noise.
arXiv Detail & Related papers (2025-11-06T15:27:29Z) - Accelerated stochastic first-order method for convex optimization under heavy-tailed noise [3.5877352183559386]
We study convex composite optimization problems, where the objective function is given by the sum of a prox-friendly function and a convex function whose subgradients are estimated under heavy-tailed noise.<n>We demonstrate that a vanilla algorithm can achieve optimal complexity for these problems without additional modifications such as clipping or normalization.
arXiv Detail & Related papers (2025-10-13T17:45:05Z) - Second-order Optimization under Heavy-Tailed Noise: Hessian Clipping and Sample Complexity Limits [53.773695219320125]
We take a first step toward a theoretical understanding of second-order optimization under heavy-tailed noise.<n>We introduce a novel algorithm based on gradient and Hessian clipping, and prove high-probability upper bounds that nearly match the fundamental limits.
arXiv Detail & Related papers (2025-10-12T16:36:54Z) - Adaptive Algorithms with Sharp Convergence Rates for Stochastic Hierarchical Optimization [31.032959636901086]
We propose novel adaptive algorithms for hierarchical optimization problems.<n>Our algorithms achieve sharp convergence rates without prior knowledge of the noise level.<n>Experiments on synthetic and deep learning tasks demonstrate the effectiveness of our proposed algorithms.
arXiv Detail & Related papers (2025-09-18T20:17:18Z) - Bilevel Learning with Inexact Stochastic Gradients [2.247833425312671]
Bilevel learning has gained prominence in machine learning, inverse problems, and imaging applications.<n>The large-scale nature of these problems has led to the development of inexact and computationally efficient methods.
arXiv Detail & Related papers (2024-12-16T18:18:47Z) - Gradient Normalization Provably Benefits Nonconvex SGD under Heavy-Tailed Noise [60.92029979853314]
We investigate the roles of gradient normalization and clipping in ensuring the convergence of Gradient Descent (SGD) under heavy-tailed noise.
Our work provides the first theoretical evidence demonstrating the benefits of gradient normalization in SGD under heavy-tailed noise.
We introduce an accelerated SGD variant incorporating gradient normalization and clipping, further enhancing convergence rates under heavy-tailed noise.
arXiv Detail & Related papers (2024-10-21T22:40:42Z) - Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - First Order Methods with Markovian Noise: from Acceleration to Variational Inequalities [91.46841922915418]
We present a unified approach for the theoretical analysis of first-order variation methods.
Our approach covers both non-linear gradient and strongly Monte Carlo problems.
We provide bounds that match the oracle strongly in the case of convex method optimization problems.
arXiv Detail & Related papers (2023-05-25T11:11:31Z) - Can Decentralized Stochastic Minimax Optimization Algorithms Converge
Linearly for Finite-Sum Nonconvex-Nonconcave Problems? [56.62372517641597]
Decentralized minimax optimization has been actively studied in the past few years due to its application in a wide range machine learning.
This paper develops two novel decentralized minimax optimization algorithms for the non-strongly-nonconcave problem.
arXiv Detail & Related papers (2023-04-24T02:19:39Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Second-Order Guarantees in Centralized, Federated and Decentralized
Nonconvex Optimization [64.26238893241322]
Simple algorithms have been shown to lead to good empirical results in many contexts.
Several works have pursued rigorous analytical justification for studying non optimization problems.
A key insight in these analyses is that perturbations play a critical role in allowing local descent algorithms.
arXiv Detail & Related papers (2020-03-31T16:54:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.