Efficient Data Distribution Estimation for Accelerated Federated Learning
- URL: http://arxiv.org/abs/2406.01774v1
- Date: Mon, 3 Jun 2024 20:33:17 GMT
- Title: Efficient Data Distribution Estimation for Accelerated Federated Learning
- Authors: Yuanli Wang, Lei Huang,
- Abstract summary: Federated Learning(FL) is a privacy-preserving machine learning paradigm where a global model is trained in-situ across a large number of distributed edge devices.
Devices are highly heterogeneous in both their system resources and training data.
Various client selection algorithms have been developed, showing promising performance improvement in terms of model coverage and accuracy.
- Score: 5.085889377571319
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Federated Learning(FL) is a privacy-preserving machine learning paradigm where a global model is trained in-situ across a large number of distributed edge devices. These systems are often comprised of millions of user devices and only a subset of available devices can be used for training in each epoch. Designing a device selection strategy is challenging, given that devices are highly heterogeneous in both their system resources and training data. This heterogeneity makes device selection very crucial for timely model convergence and sufficient model accuracy. To tackle the FL client heterogeneity problem, various client selection algorithms have been developed, showing promising performance improvement in terms of model coverage and accuracy. In this work, we study the overhead of client selection algorithms in a large scale FL environment. Then we propose an efficient data distribution summary calculation algorithm to reduce the overhead in a real-world large scale FL environment. The evaluation shows that our proposed solution could achieve up to 30x reduction in data summary time, and up to 360x reduction in clustering time.
Related papers
- Enhancing Federated Learning Convergence with Dynamic Data Queue and Data Entropy-driven Participant Selection [13.825031686864559]
Federated Learning (FL) is a decentralized approach for collaborative model training on edge devices.
We present a method to improve convergence in FL by creating a global subset of data on the server and dynamically distributing it across devices.
Our approach results in a substantial accuracy boost of approximately 5% for the MNIST dataset, around 18% for CIFAR-10, and 20% for CIFAR-100 with a 10% global subset of data, outperforming the state-of-the-art (SOTA) aggregation algorithms.
arXiv Detail & Related papers (2024-10-23T11:47:04Z) - CDFL: Efficient Federated Human Activity Recognition using Contrastive Learning and Deep Clustering [12.472038137777474]
Human Activity Recognition (HAR) is vital for the automation and intelligent identification of human actions through data from diverse sensors.
Traditional machine learning approaches by aggregating data on a central server and centralized processing are memory-intensive and raise privacy concerns.
This work proposes CDFL, an efficient federated learning framework for image-based HAR.
arXiv Detail & Related papers (2024-07-17T03:17:53Z) - Adaptive Model Pruning and Personalization for Federated Learning over
Wireless Networks [72.59891661768177]
Federated learning (FL) enables distributed learning across edge devices while protecting data privacy.
We consider a FL framework with partial model pruning and personalization to overcome these challenges.
This framework splits the learning model into a global part with model pruning shared with all devices to learn data representations and a personalized part to be fine-tuned for a specific device.
arXiv Detail & Related papers (2023-09-04T21:10:45Z) - Analysis and Optimization of Wireless Federated Learning with Data
Heterogeneity [72.85248553787538]
This paper focuses on performance analysis and optimization for wireless FL, considering data heterogeneity, combined with wireless resource allocation.
We formulate the loss function minimization problem, under constraints on long-term energy consumption and latency, and jointly optimize client scheduling, resource allocation, and the number of local training epochs (CRE)
Experiments on real-world datasets demonstrate that the proposed algorithm outperforms other benchmarks in terms of the learning accuracy and energy consumption.
arXiv Detail & Related papers (2023-08-04T04:18:01Z) - Time-sensitive Learning for Heterogeneous Federated Edge Intelligence [52.83633954857744]
We investigate real-time machine learning in a federated edge intelligence (FEI) system.
FEI systems exhibit heterogenous communication and computational resource distribution.
We propose a time-sensitive federated learning (TS-FL) framework to minimize the overall run-time for collaboratively training a shared ML model.
arXiv Detail & Related papers (2023-01-26T08:13:22Z) - ON-DEMAND-FL: A Dynamic and Efficient Multi-Criteria Federated Learning
Client Deployment Scheme [37.099990745974196]
We introduce an On-Demand-FL, a client deployment approach for federated learning.
We make use of containerization technology such as Docker to build efficient environments.
The Genetic algorithm (GA) is used to solve the multi-objective optimization problem.
arXiv Detail & Related papers (2022-11-05T13:41:19Z) - Auxo: Efficient Federated Learning via Scalable Client Clustering [22.323057948281644]
Federated learning (FL) enables edge devices to collaboratively train ML models without revealing their raw data to a logically centralized server.
We propose Auxo to gradually identify clients with statistically similar data distributions (cohorts) in large-scale, low-availability, and resource-constrained FL populations.
We show Auxo boosts various existing FL solutions in terms of final accuracy (2.1% - 8.2%), convergence time (up to 2.2x), and model bias (4.8% - 53.8%)
arXiv Detail & Related papers (2022-10-29T17:36:51Z) - Online Data Selection for Federated Learning with Limited Storage [53.46789303416799]
Federated Learning (FL) has been proposed to achieve distributed machine learning among networked devices.
The impact of on-device storage on the performance of FL is still not explored.
In this work, we take the first step to consider the online data selection for FL with limited on-device storage.
arXiv Detail & Related papers (2022-09-01T03:27:33Z) - Parallel Successive Learning for Dynamic Distributed Model Training over
Heterogeneous Wireless Networks [50.68446003616802]
Federated learning (FedL) has emerged as a popular technique for distributing model training over a set of wireless devices.
We develop parallel successive learning (PSL), which expands the FedL architecture along three dimensions.
Our analysis sheds light on the notion of cold vs. warmed up models, and model inertia in distributed machine learning.
arXiv Detail & Related papers (2022-02-07T05:11:01Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z) - Federated learning with class imbalance reduction [24.044750119251308]
Federated learning (FL) is a technique that enables a large amount of edge computing devices to collaboratively train a global learning model.
Due to privacy concerns, the raw data on devices could not be available for centralized server.
In this paper, an estimation scheme is designed to reveal the class distribution without the awareness of raw data.
arXiv Detail & Related papers (2020-11-23T08:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.