Related papers: Controlled disagreement improves generalization in decentralized training

Controlled disagreement improves generalization in decentralized training

URL: http://arxiv.org/abs/2602.02899v1
Date: Mon, 02 Feb 2026 23:14:37 GMT
Title: Controlled disagreement improves generalization in decentralized training
Authors: Zesen Wang, Mikael Johansson,
Abstract summary: Decentralized training is often regarded as inferior to centralized training because consensus errors undermine convergence and generalization.<n>This work challenges this view by introducing decentralized SGD with Adaptive Consensus (DSGD-AC)<n>We prove that these errors are not random noise but systematically align with the dominant Hessian subspace, acting as structured perturbations that guide optimization toward flatter minima.
Score: 10.764160559530845
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Decentralized training is often regarded as inferior to centralized training because the consensus errors between workers are thought to undermine convergence and generalization, even with homogeneous data distributions. This work challenges this view by introducing decentralized SGD with Adaptive Consensus (DSGD-AC), which intentionally preserves non-vanishing consensus errors through a time-dependent scaling mechanism. We prove that these errors are not random noise but systematically align with the dominant Hessian subspace, acting as structured perturbations that guide optimization toward flatter minima. Across image classification and machine translation benchmarks, DSGD-AC consistently surpasses both standard DSGD and centralized SGD in test accuracy and solution flatness. Together, these results establish consensus errors as a useful implicit regularizer and open a new perspective on the design of decentralized learning algorithms.

Related papers

Accelerating Decentralized Optimization via Overlapping Local Steps [6.713278402701195]
We propose a novel approach to accelerate decentralized computation without sacrificing theoretical guarantees.<n>We show OLDSGD retains the same iteration as Local Decentralized SGDOLDD while improving per-iteration convergence under different levels of communication delays.
arXiv Detail & Related papers (2026-01-04T11:40:22Z)
Generalized Incremental Learning under Concept Drift across Evolving Data Streams [32.62505920071586]
Real-world data streams exhibit inherent non-stationarity characterized by concept drift, posing significant challenges for adaptive learning systems.<n>We formalize Generalized Incremental Learning under Concept Drift (GILCD), characterizing the joint evolution of distributions and label spaces in open-environment streaming contexts.<n>We propose Calibrated Source-Free Adaptation (CSFA), which fuses emerging prototypes with base representations, enabling stable new-class identification.
arXiv Detail & Related papers (2025-06-06T04:36:24Z)
DeCAF: Decentralized Consensus-And-Factorization for Low-Rank Adaptation of Foundation Models [22.45637113673959]
Low-Rank Adaptation (LoRA) has emerged as one of the most effective, computationally tractable fine-tuning approaches for training Vision-Language Models (VLMs) and Large Language Models (LLMs)<n>This work improves the convergence rate of decentralized LoRA to match the rate of decentralized gradient SGD by ensuring smoothness.<n>We also introduce DeCAF, a novel algorithm integrating DLoRA with truncated singular value decomposition (TSVD)-based matrix factorization to resolve consensus interference.
arXiv Detail & Related papers (2025-05-27T16:10:53Z)
Stability and Generalization for Distributed SGDA [70.97400503482353]
We propose the stability-based generalization analytical framework for Distributed-SGDA. We conduct a comprehensive analysis of stability error, generalization gap, and population risk across different metrics. Our theoretical results reveal the trade-off between the generalization gap and optimization error.
arXiv Detail & Related papers (2024-11-14T11:16:32Z)
Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata. DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase. Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z)
Stability and Generalization of the Decentralized Stochastic Gradient Descent Ascent Algorithm [80.94861441583275]
We investigate the complexity of the generalization bound of the decentralized gradient descent (D-SGDA) algorithm. Our results analyze the impact of different top factors on the generalization of D-SGDA. We also balance it with the generalization to obtain the optimal convex-concave setting.
arXiv Detail & Related papers (2023-10-31T11:27:01Z)
Decentralized SGD and Average-direction SAM are Asymptotically Equivalent [101.37242096601315]
Decentralized gradient descent (D-SGD) allows collaborative learning on massive devices simultaneously without the control of a central server. Existing theories claim that decentralization invariably generalization.
arXiv Detail & Related papers (2023-06-05T14:19:52Z)
Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains. We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z)
Decentralized Local Stochastic Extra-Gradient for Variational Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices. We make a very general assumption on the computational network that covers the settings of fully decentralized calculations. We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.