Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs
- URL: http://arxiv.org/abs/2602.20567v1
- Date: Tue, 24 Feb 2026 05:32:03 GMT
- Title: Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs
- Authors: Yifei Liang, Yan Sun, Xiaochun Cao, Li Shen,
- Abstract summary: Push-based decentralized communication enables optimization over communication networks, where information exchange may be asymmetric.<n>We develop a unified uniform-stability framework for the Gradient Push (SGP) algorithm.<n>A key technical ingredient is an imbalance-aware generalization bound through two quantities.
- Score: 55.77845440440496
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Push-Sum-based decentralized learning enables optimization over directed communication networks, where information exchange may be asymmetric. While convergence properties of such methods are well understood, their finite-iteration stability and generalization behavior remain unclear due to structural bias induced by column-stochastic mixing and asymmetric error propagation. In this work, we develop a unified uniform-stability framework for the Stochastic Gradient Push (SGP) algorithm that captures the effect of directed topology. A key technical ingredient is an imbalance-aware consistency bound for Push-Sum, which controls consensus deviation through two quantities: the stationary distribution imbalance parameter $δ$ and the spectral gap $(1-λ)$ governing mixing speed. This decomposition enables us to disentangle statistical effects from topology-induced bias. We establish finite-iteration stability and optimization guarantees for both convex objectives and non-convex objectives satisfying the Polyak--Łojasiewicz condition. For convex problems, SGP attains excess generalization error of order $\tilde{\mathcal{O}}\!\left(\frac{1}{\sqrt{mn}}+\fracγ{δ(1-λ)}+γ\right)$ under step-size schedules, and we characterize the corresponding optimal early stopping time that minimizes this bound. For PŁ objectives, we obtain convex-like optimization and generalization rates with dominant dependence proportional to $κ\!\left(1+\frac{1}{δ(1-λ)}\right)$, revealing a multiplicative coupling between problem conditioning and directed communication topology. Our analysis clarifies when Push-Sum correction is necessary compared with standard decentralized SGD and quantifies how imbalance and mixing jointly shape the best attainable learning performance.
Related papers
- Co-optimization for Adaptive Conformal Prediction [9.881784717196675]
We propose a framework that learns prediction intervals by jointly optimizing a center $m(x)$ and a radius $h(x)$.<n>Experiments on synthetic and real benchmarks demonstrate that CoCP yields consistently shorter intervals and achieves state-of-the-art conditional-coverage diagnostics.
arXiv Detail & Related papers (2026-03-02T10:43:19Z) - Regularized Online RLHF with Generalized Bilinear Preferences [68.44113000390544]
We consider the problem of contextual online RLHF with general preferences.<n>We adopt the Generalized Bilinear Preference Model to capture preferences via low-rank, skew-symmetric matrices.<n>We prove that the dual gap of the greedy policy is bounded by the square of the estimation error.
arXiv Detail & Related papers (2026-02-26T15:27:53Z) - Majorization-Minimization Networks for Inverse Problems: An Application to EEG Imaging [4.063392865490957]
Inverse problems are often ill-posed and require optimization schemes with strong stability and convergence guarantees.<n>We propose a learned Majorization-Minimization (MM) framework for inverse problems within a bilevel optimization setting.<n>We learn a structured curvature majorant that governs each MM step while preserving classical MM descent guarantees.
arXiv Detail & Related papers (2026-01-23T10:33:45Z) - Contextual Distributionally Robust Optimization with Causal and Continuous Structure: An Interpretable and Tractable Approach [2.8445258546547625]
We introduce a framework for contextual distributionally robust optimization (DRO)<n>We first introduce the causal Sinkhorn discrepancy (CSD), an entropy-regularized causal Wasserstein distance.<n>We derive a contextual DRO model with a CSD-based ambiguity set, termed Causal Sinkhorn DRO (Causal-SDRO)<n>We propose the Soft Forest Regression (SRF) decision rule, which approximates optimal policies within arbitrary measurable function spaces.
arXiv Detail & Related papers (2026-01-16T06:18:22Z) - From Tail Universality to Bernstein-von Mises: A Unified Statistical Theory of Semi-Implicit Variational Inference [0.12183405753834557]
Semi-implicit variational inference (SIVI) constructs approximate posteriors of the form $q() = int k(| z) r(dz)$<n>This paper develops a unified "approximation-optimization-statistics'' theory for such families.
arXiv Detail & Related papers (2025-12-05T19:26:25Z) - Revisiting Zeroth-Order Optimization: Minimum-Variance Two-Point Estimators and Directionally Aligned Perturbations [57.179679246370114]
We identify the distribution of random perturbations that minimizes the estimator's variance as the perturbation stepsize tends to zero.<n>Our findings reveal that such desired perturbations can align directionally with the true gradient, instead of maintaining a fixed length.
arXiv Detail & Related papers (2025-10-22T19:06:39Z) - Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - Distributionally Robust Optimization with Bias and Variance Reduction [9.341215359733601]
We show that Prospect, a gradient-based algorithm, enjoys linear convergence for smooth regularized losses.
We also show that Prospect can converge 2-3$times$ faster than baselines such as gradient-based methods.
arXiv Detail & Related papers (2023-10-21T00:03:54Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - CEDAS: A Compressed Decentralized Stochastic Gradient Method with Improved Convergence [9.11726703830074]
In this paper, we consider solving the distributed optimization problem under the communication restricted setting.
We show the method over com-pressed exact diffusion termed CEDAS"
In particular, when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when when
arXiv Detail & Related papers (2023-01-14T09:49:15Z) - Linear Convergent Decentralized Optimization with Compression [50.44269451541387]
Existing decentralized algorithms with compression mainly focus on compressing DGD-type algorithms.
Motivated by primal-dual algorithms, this paper proposes first underlineLinunderlineEAr convergent.
underlineDecentralized with compression, LEAD.
arXiv Detail & Related papers (2020-07-01T04:35:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.