Online Data Selection for Federated Learning with Limited Storage
- URL: http://arxiv.org/abs/2209.00195v1
- Date: Thu, 1 Sep 2022 03:27:33 GMT
- Title: Online Data Selection for Federated Learning with Limited Storage
- Authors: Chen Gong, Zhenzhe Zheng, Fan Wu, Bingshuai Li, Yunfeng Shao, Guihai
Chen
- Abstract summary: Federated Learning (FL) has been proposed to achieve distributed machine learning among networked devices.
The impact of on-device storage on the performance of FL is still not explored.
In this work, we take the first step to consider the online data selection for FL with limited on-device storage.
- Score: 53.46789303416799
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning models have been deployed in mobile networks to deal with
the data from different layers to enable automated network management and
intelligence on devices. To overcome high communication cost and severe privacy
concerns of centralized machine learning, Federated Learning (FL) has been
proposed to achieve distributed machine learning among networked devices. While
the computation and communication limitation has been widely studied in FL, the
impact of on-device storage on the performance of FL is still not explored.
Without an efficient and effective data selection policy to filter the abundant
streaming data on devices, classical FL can suffer from much longer model
training time (more than $4\times$) and significant inference accuracy
reduction (more than $7\%$), observed in our experiments. In this work, we take
the first step to consider the online data selection for FL with limited
on-device storage. We first define a new data valuation metric for data
selection in FL: the projection of local gradient over an on-device data sample
onto the global gradient over the data from all devices. We further design
\textbf{ODE}, a framework of \textbf{O}nline \textbf{D}ata s\textbf{E}lection
for FL, to coordinate networked devices to store valuable data samples
collaboratively, with theoretical guarantees for speeding up model convergence
and enhancing final model accuracy, simultaneously. Experimental results on one
industrial task (mobile network traffic classification) and three public tasks
(synthetic task, image classification, human activity recognition) show the
remarkable advantages of ODE over the state-of-the-art approaches.
Particularly, on the industrial dataset, ODE achieves as high as $2.5\times$
speedup of training time and $6\%$ increase in final inference accuracy, and is
robust to various factors in the practical environment.
Related papers
- CDFL: Efficient Federated Human Activity Recognition using Contrastive Learning and Deep Clustering [12.472038137777474]
Human Activity Recognition (HAR) is vital for the automation and intelligent identification of human actions through data from diverse sensors.
Traditional machine learning approaches by aggregating data on a central server and centralized processing are memory-intensive and raise privacy concerns.
This work proposes CDFL, an efficient federated learning framework for image-based HAR.
arXiv Detail & Related papers (2024-07-17T03:17:53Z) - Efficient Data Distribution Estimation for Accelerated Federated Learning [5.085889377571319]
Federated Learning(FL) is a privacy-preserving machine learning paradigm where a global model is trained in-situ across a large number of distributed edge devices.
Devices are highly heterogeneous in both their system resources and training data.
Various client selection algorithms have been developed, showing promising performance improvement in terms of model coverage and accuracy.
arXiv Detail & Related papers (2024-06-03T20:33:17Z) - Semi-Federated Learning: Convergence Analysis and Optimization of A
Hybrid Learning Framework [70.83511997272457]
We propose a semi-federated learning (SemiFL) paradigm to leverage both the base station (BS) and devices for a hybrid implementation of centralized learning (CL) and FL.
We propose a two-stage algorithm to solve this intractable problem, in which we provide the closed-form solutions to the beamformers.
arXiv Detail & Related papers (2023-10-04T03:32:39Z) - Adaptive Model Pruning and Personalization for Federated Learning over
Wireless Networks [72.59891661768177]
Federated learning (FL) enables distributed learning across edge devices while protecting data privacy.
We consider a FL framework with partial model pruning and personalization to overcome these challenges.
This framework splits the learning model into a global part with model pruning shared with all devices to learn data representations and a personalized part to be fine-tuned for a specific device.
arXiv Detail & Related papers (2023-09-04T21:10:45Z) - Analysis and Optimization of Wireless Federated Learning with Data
Heterogeneity [72.85248553787538]
This paper focuses on performance analysis and optimization for wireless FL, considering data heterogeneity, combined with wireless resource allocation.
We formulate the loss function minimization problem, under constraints on long-term energy consumption and latency, and jointly optimize client scheduling, resource allocation, and the number of local training epochs (CRE)
Experiments on real-world datasets demonstrate that the proposed algorithm outperforms other benchmarks in terms of the learning accuracy and energy consumption.
arXiv Detail & Related papers (2023-08-04T04:18:01Z) - Enhancing Efficiency in Multidevice Federated Learning through Data Selection [11.67484476827617]
Federated learning (FL) in multidevice environments creates new opportunities to learn from a vast and diverse amount of private data.
In this paper, we develop an FL framework to incorporate on-device data selection on such constrained devices.
We show that our framework achieves 19% higher accuracy and 58% lower latency; compared to the baseline FL without our implemented strategies.
arXiv Detail & Related papers (2022-11-08T11:39:17Z) - Data Heterogeneity-Robust Federated Learning via Group Client Selection
in Industrial IoT [57.67687126339891]
FedGS is a hierarchical cloud-edge-end FL framework for 5G empowered industries.
Taking advantage of naturally clustered factory devices, FedGS uses a gradient-based binary permutation algorithm.
Experiments show that FedGS improves accuracy by 3.5% and reduces training rounds by 59% on average.
arXiv Detail & Related papers (2022-02-03T10:48:17Z) - A Framework for Energy and Carbon Footprint Analysis of Distributed and
Federated Edge Learning [48.63610479916003]
This article breaks down and analyzes the main factors that influence the environmental footprint of distributed learning policies.
It models both vanilla and decentralized FL policies driven by consensus.
Results show that FL allows remarkable end-to-end energy savings (30%-40%) for wireless systems characterized by low bit/Joule efficiency.
arXiv Detail & Related papers (2021-03-18T16:04:42Z) - FLaPS: Federated Learning and Privately Scaling [3.618133010429131]
Federated learning (FL) is a distributed learning process where the model is transferred to the devices that posses data.
We present Federated Learning and Privately Scaling (FLaPS) architecture, which improves scalability as well as the security and privacy of the system.
arXiv Detail & Related papers (2020-09-13T14:20:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.