Related papers: DIGing--SGLD: Decentralized and Scalable Langevin Sampling over Time--Varying Networks

DIGing--SGLD: Decentralized and Scalable Langevin Sampling over Time--Varying Networks

URL: http://arxiv.org/abs/2511.12836v1
Date: Sun, 16 Nov 2025 23:42:44 GMT
Title: DIGing--SGLD: Decentralized and Scalable Langevin Sampling over Time--Varying Networks
Authors: Waheed U. Bajwa, Mert Gurbuzbalaban, Mustafa Ali Kutbay, Lingjiong Zhu, Muhammad Zulqarnain,
Abstract summary: This paper introduces DIGing-SGLD, a decentralized SGLD algorithm for scalable Bayesian learning in multi-agent systems.<n>We provide the first finite-time non-asymptotic convergence guarantees for decentralized SGLD-based sampling over time-varying networks.
Score: 7.477601047470181
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sampling from a target distribution induced by training data is central to Bayesian learning, with Stochastic Gradient Langevin Dynamics (SGLD) serving as a key tool for scalable posterior sampling and decentralized variants enabling learning when data are distributed across a network of agents. This paper introduces DIGing-SGLD, a decentralized SGLD algorithm designed for scalable Bayesian learning in multi-agent systems operating over time-varying networks. Existing decentralized SGLD methods are restricted to static network topologies, and many exhibit steady-state sampling bias caused by network effects, even when full batches are used. DIGing-SGLD overcomes these limitations by integrating Langevin-based sampling with the gradient-tracking mechanism of the DIGing algorithm, originally developed for decentralized optimization over time-varying networks, thereby enabling efficient and bias-free sampling without a central coordinator. To our knowledge, we provide the first finite-time non-asymptotic Wasserstein convergence guarantees for decentralized SGLD-based sampling over time-varying networks, with explicit constants. Under standard strong convexity and smoothness assumptions, DIGing-SGLD achieves geometric convergence to an $O(\sqrtη)$ neighborhood of the target distribution, where $η$ is the stepsize, with dependence on the target accuracy matching the best-known rates for centralized and static-network SGLD algorithms using constant stepsize. Numerical experiments on Bayesian linear and logistic regression validate the theoretical results and demonstrate the strong empirical performance of DIGing-SGLD under dynamically evolving network conditions.

Related papers

Generalized EXTRA stochastic gradient Langevin dynamics [6.899153618328339]
Langevin algorithms are popular Markov Chain Monte Carlo methods for Bayesian learning.<n>Their versions such as Langevin dynamics (SGLD) allow iterative learning based on randomly sampled mini-batches.<n>When data is decentralized across a network of agents subject to communication and privacy constraints, standard SGLD algorithms cannot be applied.
arXiv Detail & Related papers (2024-12-02T21:57:30Z)
Stability and Generalization for Distributed SGDA [70.97400503482353]
We propose the stability-based generalization analytical framework for Distributed-SGDA. We conduct a comprehensive analysis of stability error, generalization gap, and population risk across different metrics. Our theoretical results reveal the trade-off between the generalization gap and optimization error.
arXiv Detail & Related papers (2024-11-14T11:16:32Z)
Stability and Generalization of the Decentralized Stochastic Gradient Descent Ascent Algorithm [80.94861441583275]
We investigate the complexity of the generalization bound of the decentralized gradient descent (D-SGDA) algorithm. Our results analyze the impact of different top factors on the generalization of D-SGDA. We also balance it with the generalization to obtain the optimal convex-concave setting.
arXiv Detail & Related papers (2023-10-31T11:27:01Z)
Decentralized SGD and Average-direction SAM are Asymptotically Equivalent [101.37242096601315]
Decentralized gradient descent (D-SGD) allows collaborative learning on massive devices simultaneously without the control of a central server. Existing theories claim that decentralization invariably generalization.
arXiv Detail & Related papers (2023-06-05T14:19:52Z)
Decentralized Federated Reinforcement Learning for User-Centric Dynamic TFDD Control [37.54493447920386]
We propose a learning-based dynamic time-frequency division duplexing (D-TFDD) scheme to meet asymmetric and heterogeneous traffic demands. We formulate the problem as a decentralized partially observable Markov decision process (Dec-POMDP) In order to jointly optimize the global resources in a decentralized manner, we propose a federated reinforcement learning (RL) algorithm named Wolpertinger deep deterministic policy gradient (FWDDPG) algorithm.
arXiv Detail & Related papers (2022-11-04T07:39:21Z)
DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over Graphs [54.08445874064361]
We propose to solve a regularized distributionally robust learning problem in the decentralized setting. By adding a Kullback-Liebler regularization function to the robust min-max optimization problem, the learning problem can be reduced to a modified robust problem. We show that our proposed algorithm can improve the worst distribution test accuracy by up to $10%$.
arXiv Detail & Related papers (2022-08-29T18:01:42Z)
Learning with Local Gradients at the Edge [14.94491070863641]
We present a novel backpropagation-free optimization algorithm dubbed Target Projection Gradient Descent (tpSGD) tpSGD generalizes direct random target projection to work with arbitrary loss functions. We evaluate the performance of tpSGD in training deep neural networks and extend the approach to multi-layer RNNs.
arXiv Detail & Related papers (2022-08-17T19:51:06Z)
On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons. Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z)
Decentralized Stochastic Proximal Gradient Descent with Variance Reduction over Time-varying Networks [30.231314171218994]
In decentralized learning, a network of nodes cooperate to minimize an overall objective function that is usually the finite-sum of their local objectives. We propose a novel algorithm, namely DPSVRG, to accelerate the decentralized training by leveraging the variance reduction technique.
arXiv Detail & Related papers (2021-12-20T08:23:36Z)
Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging [48.99717153937717]
We present WAGMA-SGD, a wait-avoiding subgroup that reduces global communication via weight exchange.<n>We train ResNet-50 on ImageNet; Transformer for machine translation; and deep reinforcement learning for navigation at scale.<n>Compared with state-of-the-art decentralized SGD variants, WAGMA-SGD significantly improves training throughput.
arXiv Detail & Related papers (2020-04-30T22:11:53Z)
Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD [32.03967072200476]
We propose an algorithmic approach named OverlapLocal-Local-Local-SGD (Local momentum variant) We achieve this by adding an anchor model on each node. After multiple local updates, locally trained models will be pulled back towards the anchor model rather than communicating with others.
arXiv Detail & Related papers (2020-02-21T20:33:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.