Related papers: The Built-In Robustness of Decentralized Federated Averaging to Bad Data

The Built-In Robustness of Decentralized Federated Averaging to Bad Data

URL: http://arxiv.org/abs/2502.18097v1
Date: Tue, 25 Feb 2025 11:06:51 GMT
Title: The Built-In Robustness of Decentralized Federated Averaging to Bad Data
Authors: Samuele Sabella, Chiara Boldrini, Lorenzo Valerio, Andrea Passarella, Marco Conti,
Abstract summary: Decentralized federated learning (DFL) enables devices to collaboratively train models over complex network topologies without relying on a central controller.<n>In this setting, local data remains private, but its quality and quantity can vary significantly across nodes.<n>We simulate two scenarios with degraded data quality, one where the corrupted data is evenly distributed in a subset of nodes and one where it is concentrated on a single node.
Score: 2.7961972519572447
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Decentralized federated learning (DFL) enables devices to collaboratively train models over complex network topologies without relying on a central controller. In this setting, local data remains private, but its quality and quantity can vary significantly across nodes. The extent to which a fully decentralized system is vulnerable to poor-quality or corrupted data remains unclear, but several factors could contribute to potential risks. Without a central authority, there can be no unified mechanism to detect or correct errors, and each node operates with a localized view of the data distribution, making it difficult for the node to assess whether its perspective aligns with the true distribution. Moreover, models trained on low-quality data can propagate through the network, amplifying errors. To explore the impact of low-quality data on DFL, we simulate two scenarios with degraded data quality -- one where the corrupted data is evenly distributed in a subset of nodes and one where it is concentrated on a single node -- using a decentralized implementation of FedAvg. Our results reveal that averaging-based decentralized learning is remarkably robust to localized bad data, even when the corrupted data resides in the most influential nodes of the network. Counterintuitively, this robustness is further enhanced when the corrupted data is concentrated on a single node, regardless of its centrality in the communication network topology. This phenomenon is explained by the averaging process, which ensures that no single node -- however central -- can disproportionately influence the overall learning process.

Related papers

Privacy Preserving Semi-Decentralized Mean Estimation over Intermittently-Connected Networks [59.43433767253956]
We consider the problem of privately estimating the mean of vectors distributed across different nodes of an unreliable wireless network. In a semi-decentralized setup, nodes can collaborate with their neighbors to compute a local consensus, which they relay to a central server. We study the tradeoff between collaborative relaying and privacy leakage due to the data sharing among nodes.
arXiv Detail & Related papers (2024-06-06T06:12:15Z)
Robustness of Decentralised Learning to Nodes and Data Disruption [4.062458976723649]
We study the effect of nodes' disruption on the collective learning process. Our results show that decentralised learning processes are remarkably robust to network disruption.
arXiv Detail & Related papers (2024-05-03T12:14:48Z)
Impact of network topology on the performance of Decentralized Federated Learning [4.618221836001186]
Decentralized machine learning is gaining momentum, addressing infrastructure challenges and privacy concerns. This study investigates the interplay between network structure and learning performance using three network topologies and six data distribution methods. We highlight the challenges in transferring knowledge from peripheral to central nodes, attributed to a dilution effect during model aggregation.
arXiv Detail & Related papers (2024-02-28T11:13:53Z)
Exploring the Impact of Disrupted Peer-to-Peer Communications on Fully Decentralized Learning in Disaster Scenarios [4.618221836001186]
Fully decentralized learning enables the distribution of learning resources across multiple user devices or nodes. This study investigates the effects of various disruptions to peer-to-peer communications on decentralized learning in a disaster setting.
arXiv Detail & Related papers (2023-10-04T17:24:38Z)
Does Decentralized Learning with Non-IID Unlabeled Data Benefit from Self Supervision? [51.00034621304361]
We study decentralized learning with unlabeled data through the lens of self-supervised learning (SSL) We study the effectiveness of contrastive learning algorithms under decentralized learning settings.
arXiv Detail & Related papers (2022-10-20T01:32:41Z)
FedDKD: Federated Learning with Decentralized Knowledge Distillation [3.9084449541022055]
We propose a novel framework of federated learning equipped with the process of decentralized knowledge distillation (FedDKD) We show that FedDKD outperforms the state-of-the-art methods with more efficient communication and training in a few DKD steps.
arXiv Detail & Related papers (2022-05-02T07:54:07Z)
RelaySum for Decentralized Deep Learning on Heterogeneous Data [71.36228931225362]
In decentralized machine learning, workers compute model updates on their local data. Because the workers only communicate with few neighbors without central coordination, these updates propagate progressively over the network. This paradigm enables distributed training on networks without all-to-all connectivity, helping to protect data privacy as well as to reduce the communication cost of distributed training in data centers.
arXiv Detail & Related papers (2021-10-08T14:55:32Z)
Two-Bit Aggregation for Communication Efficient and Differentially Private Federated Learning [79.66767935077925]
In federated learning (FL), a machine learning model is trained on multiple nodes in a decentralized manner, while keeping the data local and not shared with other nodes. The information sent from the nodes to the server may reveal some details about each node's local data, thus raising privacy concerns. A novel two-bit aggregation algorithm is proposed with guaranteed differential privacy and reduced uplink communication overhead.
arXiv Detail & Related papers (2021-10-06T19:03:58Z)
Decentralized Local Stochastic Extra-Gradient for Variational Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices. We make a very general assumption on the computational network that covers the settings of fully decentralized calculations. We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z)
Consensus Control for Decentralized Deep Learning [72.50487751271069]
Decentralized training of deep learning models enables on-device learning over networks, as well as efficient scaling to large compute clusters. We show in theory that when the training consensus distance is lower than a critical quantity, decentralized training converges as fast as the centralized counterpart. Our empirical insights allow the principled design of better decentralized training schemes that mitigate the performance drop.
arXiv Detail & Related papers (2021-02-09T13:58:33Z)
Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data [77.88594632644347]
Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks. In realistic learning scenarios, the presence of heterogeneity across different clients' local datasets poses an optimization challenge. We propose a novel momentum-based method to mitigate this decentralized training difficulty.
arXiv Detail & Related papers (2021-02-09T11:27:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.