DIGing--SGLD: Decentralized and Scalable Langevin Sampling over Time--Varying Networks
- URL: http://arxiv.org/abs/2511.12836v1
- Date: Sun, 16 Nov 2025 23:42:44 GMT
- Title: DIGing--SGLD: Decentralized and Scalable Langevin Sampling over Time--Varying Networks
- Authors: Waheed U. Bajwa, Mert Gurbuzbalaban, Mustafa Ali Kutbay, Lingjiong Zhu, Muhammad Zulqarnain,
- Abstract summary: This paper introduces DIGing-SGLD, a decentralized SGLD algorithm for scalable Bayesian learning in multi-agent systems.<n>We provide the first finite-time non-asymptotic convergence guarantees for decentralized SGLD-based sampling over time-varying networks.
- Score: 7.477601047470181
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sampling from a target distribution induced by training data is central to Bayesian learning, with Stochastic Gradient Langevin Dynamics (SGLD) serving as a key tool for scalable posterior sampling and decentralized variants enabling learning when data are distributed across a network of agents. This paper introduces DIGing-SGLD, a decentralized SGLD algorithm designed for scalable Bayesian learning in multi-agent systems operating over time-varying networks. Existing decentralized SGLD methods are restricted to static network topologies, and many exhibit steady-state sampling bias caused by network effects, even when full batches are used. DIGing-SGLD overcomes these limitations by integrating Langevin-based sampling with the gradient-tracking mechanism of the DIGing algorithm, originally developed for decentralized optimization over time-varying networks, thereby enabling efficient and bias-free sampling without a central coordinator. To our knowledge, we provide the first finite-time non-asymptotic Wasserstein convergence guarantees for decentralized SGLD-based sampling over time-varying networks, with explicit constants. Under standard strong convexity and smoothness assumptions, DIGing-SGLD achieves geometric convergence to an $O(\sqrtη)$ neighborhood of the target distribution, where $η$ is the stepsize, with dependence on the target accuracy matching the best-known rates for centralized and static-network SGLD algorithms using constant stepsize. Numerical experiments on Bayesian linear and logistic regression validate the theoretical results and demonstrate the strong empirical performance of DIGing-SGLD under dynamically evolving network conditions.
Related papers
- Generalized EXTRA stochastic gradient Langevin dynamics [6.899153618328339]
Langevin algorithms are popular Markov Chain Monte Carlo methods for Bayesian learning.<n>Their versions such as Langevin dynamics (SGLD) allow iterative learning based on randomly sampled mini-batches.<n>When data is decentralized across a network of agents subject to communication and privacy constraints, standard SGLD algorithms cannot be applied.
arXiv Detail & Related papers (2024-12-02T21:57:30Z) - Stability and Generalization for Distributed SGDA [70.97400503482353]
We propose the stability-based generalization analytical framework for Distributed-SGDA.
We conduct a comprehensive analysis of stability error, generalization gap, and population risk across different metrics.
Our theoretical results reveal the trade-off between the generalization gap and optimization error.
arXiv Detail & Related papers (2024-11-14T11:16:32Z) - Stability and Generalization of the Decentralized Stochastic Gradient
Descent Ascent Algorithm [80.94861441583275]
We investigate the complexity of the generalization bound of the decentralized gradient descent (D-SGDA) algorithm.
Our results analyze the impact of different top factors on the generalization of D-SGDA.
We also balance it with the generalization to obtain the optimal convex-concave setting.
arXiv Detail & Related papers (2023-10-31T11:27:01Z) - Decentralized SGD and Average-direction SAM are Asymptotically
Equivalent [101.37242096601315]
Decentralized gradient descent (D-SGD) allows collaborative learning on massive devices simultaneously without the control of a central server.
Existing theories claim that decentralization invariably generalization.
arXiv Detail & Related papers (2023-06-05T14:19:52Z) - Decentralized Federated Reinforcement Learning for User-Centric Dynamic
TFDD Control [37.54493447920386]
We propose a learning-based dynamic time-frequency division duplexing (D-TFDD) scheme to meet asymmetric and heterogeneous traffic demands.
We formulate the problem as a decentralized partially observable Markov decision process (Dec-POMDP)
In order to jointly optimize the global resources in a decentralized manner, we propose a federated reinforcement learning (RL) algorithm named Wolpertinger deep deterministic policy gradient (FWDDPG) algorithm.
arXiv Detail & Related papers (2022-11-04T07:39:21Z) - DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over
Graphs [54.08445874064361]
We propose to solve a regularized distributionally robust learning problem in the decentralized setting.
By adding a Kullback-Liebler regularization function to the robust min-max optimization problem, the learning problem can be reduced to a modified robust problem.
We show that our proposed algorithm can improve the worst distribution test accuracy by up to $10%$.
arXiv Detail & Related papers (2022-08-29T18:01:42Z) - Learning with Local Gradients at the Edge [14.94491070863641]
We present a novel backpropagation-free optimization algorithm dubbed Target Projection Gradient Descent (tpSGD)
tpSGD generalizes direct random target projection to work with arbitrary loss functions.
We evaluate the performance of tpSGD in training deep neural networks and extend the approach to multi-layer RNNs.
arXiv Detail & Related papers (2022-08-17T19:51:06Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Decentralized Stochastic Proximal Gradient Descent with Variance
Reduction over Time-varying Networks [30.231314171218994]
In decentralized learning, a network of nodes cooperate to minimize an overall objective function that is usually the finite-sum of their local objectives.
We propose a novel algorithm, namely DPSVRG, to accelerate the decentralized training by leveraging the variance reduction technique.
arXiv Detail & Related papers (2021-12-20T08:23:36Z) - Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging [48.99717153937717]
We present WAGMA-SGD, a wait-avoiding subgroup that reduces global communication via weight exchange.<n>We train ResNet-50 on ImageNet; Transformer for machine translation; and deep reinforcement learning for navigation at scale.<n>Compared with state-of-the-art decentralized SGD variants, WAGMA-SGD significantly improves training throughput.
arXiv Detail & Related papers (2020-04-30T22:11:53Z) - Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays
in Distributed SGD [32.03967072200476]
We propose an algorithmic approach named OverlapLocal-Local-Local-SGD (Local momentum variant)
We achieve this by adding an anchor model on each node.
After multiple local updates, locally trained models will be pulled back towards the anchor model rather than communicating with others.
arXiv Detail & Related papers (2020-02-21T20:33:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.