Decentralized Non-convex Stochastic Optimization with Heterogeneous Variance
- URL: http://arxiv.org/abs/2602.11789v1
- Date: Thu, 12 Feb 2026 10:13:53 GMT
- Title: Decentralized Non-convex Stochastic Optimization with Heterogeneous Variance
- Authors: Hongxu Chen, Ke Wei, Luo Luo,
- Abstract summary: Decentralized optimization is critical for solving large-scale machine learning problems over distributed networks.<n>We propose D-NSS, a decentralized algorithm with node-specific sampling, and establish its sample complexity depending on the arithmetic mean of local standard deviations.<n>We extend the framework with a variance reduction technique and develop D-NSS-VR, which under the mean-squared smoothness assumption attains an improved sample complexity bound.
- Score: 24.276930163169236
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decentralized optimization is critical for solving large-scale machine learning problems over distributed networks, where multiple nodes collaborate through local communication. In practice, the variances of stochastic gradient estimators often differ across nodes, yet their impact on algorithm design and complexity remains unclear. To address this issue, we propose D-NSS, a decentralized algorithm with node-specific sampling, and establish its sample complexity depending on the arithmetic mean of local standard deviations, achieving tighter bounds than existing methods that rely on the worst-case or quadratic mean. We further derive a matching sample complexity lower bound under heterogeneous variance, thereby proving the optimality of this dependence. Moreover, we extend the framework with a variance reduction technique and develop D-NSS-VR, which under the mean-squared smoothness assumption attains an improved sample complexity bound while preserving the arithmetic-mean dependence. Finally, numerical experiments validate the theoretical results and demonstrate the effectiveness of the proposed algorithms.
Related papers
- Multi-Dimensional Visual Data Recovery: Scale-Aware Tensor Modeling and Accelerated Randomized Computation [51.65236537605077]
We propose a new type of network compression optimization technique, fully randomized tensor network compression (FCTN)<n>FCTN has significant advantages in correlation characterization and transpositional in algebra, and has notable achievements in multi-dimensional data processing and analysis.<n>We derive efficient algorithms with guarantees to solve the formulated models.
arXiv Detail & Related papers (2026-02-13T14:56:37Z) - Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture.<n>Non-smooth regularization is often incorporated into machine learning tasks.<n>We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z) - Stochastic Optimization with Optimal Importance Sampling [49.484190237840714]
We propose an iterative-based algorithm that jointly updates the decision and the IS distribution without requiring time-scale separation between the two.<n>Our method achieves the lowest possible variable variance and guarantees global convergence under convexity of the objective and mild assumptions on the IS distribution family.
arXiv Detail & Related papers (2025-04-04T16:10:18Z) - Communication-Efficient Stochastic Distributed Learning [3.2923780772605595]
We address distributed learning problems, both non and convex, undirected networks.<n>In particular, we design a novel based on the distributed Alternating Method of Multipliers (MM) to address the challenges of high communication costs.
arXiv Detail & Related papers (2025-01-23T10:05:23Z) - Distributionally Robust Optimization via Iterative Algorithms in Continuous Probability Spaces [6.992239210938067]
We consider a minimax problem motivated by distributionally robust optimization (DRO) when the worst-case distribution is continuous.<n>Recent research has explored learning the worst-case distribution using neural network-based generative networks.<n>This paper bridges this theoretical challenge by presenting an iterative algorithm to solve such a minimax problem.
arXiv Detail & Related papers (2024-12-29T19:31:23Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Stability and Generalization of the Decentralized Stochastic Gradient
Descent Ascent Algorithm [80.94861441583275]
We investigate the complexity of the generalization bound of the decentralized gradient descent (D-SGDA) algorithm.
Our results analyze the impact of different top factors on the generalization of D-SGDA.
We also balance it with the generalization to obtain the optimal convex-concave setting.
arXiv Detail & Related papers (2023-10-31T11:27:01Z) - A Variance-Reduced Stochastic Gradient Tracking Algorithm for
Decentralized Optimization with Orthogonality Constraints [7.028225540638832]
We propose a novel algorithm for decentralized optimization with orthogonality constraints.
VRSGT is the first algorithm for decentralized optimization with orthogonality constraints that reduces both sampling and communication complexities simultaneously.
In the numerical experiments, VRGTS has a promising performance in a real-world autonomous sample.
arXiv Detail & Related papers (2022-08-29T14:46:44Z) - Asynchronous Distributed Reinforcement Learning for LQR Control via Zeroth-Order Block Coordinate Descent [7.6860514640178]
We propose a novel zeroth-order optimization algorithm for distributed reinforcement learning.
It allows each agent to estimate its local gradient by cost evaluation independently, without use of any consensus protocol.
arXiv Detail & Related papers (2021-07-26T18:11:07Z) - Decentralized Local Stochastic Extra-Gradient for Variational
Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices.
We make a very general assumption on the computational network that covers the settings of fully decentralized calculations.
We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z) - Efficient Methods for Structured Nonconvex-Nonconcave Min-Max
Optimization [98.0595480384208]
We propose a generalization extraient spaces which converges to a stationary point.
The algorithm applies not only to general $p$-normed spaces, but also to general $p$-dimensional vector spaces.
arXiv Detail & Related papers (2020-10-31T21:35:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.