Related papers: A Robust Federated Learning Framework for Undependable Devices at Scale

A Robust Federated Learning Framework for Undependable Devices at Scale

URL: http://arxiv.org/abs/2412.19991v1
Date: Sat, 28 Dec 2024 03:28:52 GMT
Title: A Robust Federated Learning Framework for Undependable Devices at Scale
Authors: Shilong Wang, Jianchun Liu, Hongli Xu, Chunming Qiao, Huarong Deng, Qiuye Zheng, Jiantao Gong,
Abstract summary: In a federated learning system, many devices, such as smartphones, are often undependable (e.g., frequently disconnected from WiFi) during training.<n>Existing FL frameworks always assume a dependable environment and exclude undependable devices from training.<n>We propose FLUDE to effectively deal with undependable environments.
Score: 24.28558003071587
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In a federated learning (FL) system, many devices, such as smartphones, are often undependable (e.g., frequently disconnected from WiFi) during training. Existing FL frameworks always assume a dependable environment and exclude undependable devices from training, leading to poor model performance and resource wastage. In this paper, we propose FLUDE to effectively deal with undependable environments. First, FLUDE assesses the dependability of devices based on the probability distribution of their historical behaviors (e.g., the likelihood of successfully completing training). Based on this assessment, FLUDE adaptively selects devices with high dependability for training. To mitigate resource wastage during the training phase, FLUDE maintains a model cache on each device, aiming to preserve the latest training state for later use in case local training on an undependable device is interrupted. Moreover, FLUDE proposes a staleness-aware strategy to judiciously distribute the global model to a subset of devices, thus significantly reducing resource wastage while maintaining model performance. We have implemented FLUDE on two physical platforms with 120 smartphones and NVIDIA Jetson devices. Extensive experimental results demonstrate that FLUDE can effectively improve model performance and resource efficiency of FL training in undependable environments.

Related papers

Learn More by Using Less: Distributed Learning with Energy-Constrained Devices [3.730504020733928]
Federated Learning (FL) has emerged as a solution for distributed model training across decentralized, privacy-preserving devices.<n>We propose LeanFed, an energy-aware FL framework designed to optimize client selection and training workloads on battery-constrained devices.
arXiv Detail & Related papers (2024-12-03T09:06:57Z)
Efficient Asynchronous Federated Learning with Sparsification and Quantization [55.6801207905772]
Federated Learning (FL) is attracting more and more attention to collaboratively train a machine learning model without transferring raw data. FL generally exploits a parameter server and a large number of edge devices during the whole process of the model training. We propose TEASQ-Fed to exploit edge devices to asynchronously participate in the training process by actively applying for tasks.
arXiv Detail & Related papers (2023-12-23T07:47:07Z)
FlexTrain: A Dynamic Training Framework for Heterogeneous Devices Environments [12.165263783903216]
FlexTrain is a framework that accommodates the diverse storage and computational resources available on different devices during the training phase. We demonstrate the effectiveness of FlexTrain on the CIFAR-100 dataset, where a single global model trained with FlexTrain can be easily deployed on heterogeneous devices.
arXiv Detail & Related papers (2023-10-31T13:51:13Z)
Adaptive Model Pruning and Personalization for Federated Learning over Wireless Networks [72.59891661768177]
Federated learning (FL) enables distributed learning across edge devices while protecting data privacy. We consider a FL framework with partial model pruning and personalization to overcome these challenges. This framework splits the learning model into a global part with model pruning shared with all devices to learn data representations and a personalized part to be fine-tuned for a specific device.
arXiv Detail & Related papers (2023-09-04T21:10:45Z)
FLuID: Mitigating Stragglers in Federated Learning using Invariant Dropout [1.8262547855491458]
Federated Learning allows machine learning models to train locally on individual mobile devices, synchronizing model updates via a shared server. As a result, straggler devices with lower performance often dictate the overall training time in FL. We introduce Invariant Dropout, a method that extracts a sub-model based on the weight update threshold. We develop an adaptive training framework, Federated Learning using Invariant Dropout.
arXiv Detail & Related papers (2023-07-05T19:53:38Z)
Time-sensitive Learning for Heterogeneous Federated Edge Intelligence [52.83633954857744]
We investigate real-time machine learning in a federated edge intelligence (FEI) system. FEI systems exhibit heterogenous communication and computational resource distribution. We propose a time-sensitive federated learning (TS-FL) framework to minimize the overall run-time for collaboratively training a shared ML model.
arXiv Detail & Related papers (2023-01-26T08:13:22Z)
Online Data Selection for Federated Learning with Limited Storage [53.46789303416799]
Federated Learning (FL) has been proposed to achieve distributed machine learning among networked devices. The impact of on-device storage on the performance of FL is still not explored. In this work, we take the first step to consider the online data selection for FL with limited on-device storage.
arXiv Detail & Related papers (2022-09-01T03:27:33Z)
FedHiSyn: A Hierarchical Synchronous Federated Learning Framework for Resource and Data Heterogeneity [56.82825745165945]
Federated Learning (FL) enables training a global model without sharing the decentralized raw data stored on multiple devices to protect data privacy. We propose a hierarchical synchronous FL framework, i.e., FedHiSyn, to tackle the problems of straggler effects and outdated models. We evaluate the proposed framework based on MNIST, EMNIST, CIFAR10 and CIFAR100 datasets and diverse heterogeneous settings of devices.
arXiv Detail & Related papers (2022-06-21T17:23:06Z)
FLAME: Federated Learning Across Multi-device Environments [9.810211000961647]
Federated Learning (FL) enables distributed training of machine learning models while keeping personal data on user devices private. We propose FLAME, a user-centered FL training approach to counter statistical and system heterogeneity in multi-device environments. Our experiment results show that FLAME outperforms various baselines by 4.8-33.8% higher F-1 score, 1.02-2.86x greater energy efficiency, and up to 2.02x speedup in convergence.
arXiv Detail & Related papers (2022-02-17T22:23:56Z)
FLaaS: Federated Learning as a Service [3.128267020893596]
We present Federated Learning as a Service (FL), a system enabling different scenarios of 3rd-party application collaborative model building. As a proof of concept, we implement it on a mobile phone setting and discuss practical implications of results on simulated and real devices. We demonstrate FL's feasibility in building unique or joint FL models across applications for image object detection in a few hours, across 100 devices.
arXiv Detail & Related papers (2020-11-18T15:56:22Z)
Fast-Convergent Federated Learning [82.32029953209542]
Federated learning is a promising solution for distributing machine learning tasks through modern networks of mobile devices. We propose a fast-convergent federated learning algorithm, called FOLB, which performs intelligent sampling of devices in each round of model training.
arXiv Detail & Related papers (2020-07-26T14:37:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.