OCTOPUS: Overcoming Performance andPrivatization Bottlenecks in
Distributed Learning
- URL: http://arxiv.org/abs/2105.00602v1
- Date: Mon, 3 May 2021 02:24:53 GMT
- Title: OCTOPUS: Overcoming Performance andPrivatization Bottlenecks in
Distributed Learning
- Authors: Shuo Wang, Surya Nepal, Kristen Moore, Marthie Grobler, Carsten
Rudolph, Alsharif Abuadbba
- Abstract summary: Federated learning enables distributed participants to collaboratively learn a commonly-shared model while holding data locally.
We introduce a new distributed learning scheme to address communication overhead via latent compression.
We show that downstream tasks on the compact latent representations can achieve comparable accuracy to centralized learning.
- Score: 16.98452728773235
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The diversity and quantity of the data warehousing, gathering data from
distributed devices such as mobile phones, can enhance machine learning
algorithms' success and robustness. Federated learning enables distributed
participants to collaboratively learn a commonly-shared model while holding
data locally. However, it is also faced with expensive communication and
limitations due to the heterogeneity of distributed data sources and lack of
access to global data. In this paper, we investigate a practical distributed
learning scenario where multiple downstream tasks (e.g., classifiers) could be
learned from dynamically-updated and non-iid distributed data sources,
efficiently and providing local privatization. We introduce a new distributed
learning scheme to address communication overhead via latent compression,
leveraging global data while providing local privatization of local data
without additional cost due to encryption or perturbation. This scheme divides
the learning into (1) informative feature encoding, extracting and transmitting
the latent space compressed representation features of local data at each node
to address communication overhead; (2) downstream tasks centralized at the
server using the encoded codes gathered from each node to address computing and
storage overhead. Besides, a disentanglement strategy is applied to address the
privatization of sensitive components of local data. Extensive experiments are
conducted on image and speech datasets. The results demonstrate that downstream
tasks on the compact latent representations can achieve comparable accuracy to
centralized learning with the privatization of local data.
Related papers
- Federated Impression for Learning with Distributed Heterogeneous Data [19.50235109938016]
Federated learning (FL) provides a paradigm that can learn from distributed datasets across clients without requiring them to share data.
In FL, sub-optimal convergence is common among data from different health centers due to the variety in data collection protocols and patient demographics across centers.
We propose FedImpres which alleviates catastrophic forgetting by restoring synthetic data that represents the global information as federated impression.
arXiv Detail & Related papers (2024-09-11T15:37:52Z) - One-Shot Collaborative Data Distillation [9.428116807615407]
Large machine-learning training datasets can be distilled into small collections of informative synthetic data samples.
These synthetic sets support efficient model learning and reduce the communication cost of data sharing.
A naive way to construct a synthetic set in a distributed environment is to allow each client to perform local data distillation and to merge local distillations at a central server.
We introduce the first collaborative data distillation technique, called CollabDM, which captures the global distribution of the data and requires only a single round of communication between client and server.
arXiv Detail & Related papers (2024-08-05T06:47:32Z) - Benchmarking FedAvg and FedCurv for Image Classification Tasks [1.376408511310322]
This paper focuses on the problem of statistical heterogeneity of the data in the same federated network.
Several Federated Learning algorithms, such as FedAvg, FedProx and Federated Curvature (FedCurv) have already been proposed.
As a side product of this work, we release the non-IID version of the datasets we used so to facilitate further comparisons from the FL community.
arXiv Detail & Related papers (2023-03-31T10:13:01Z) - Does Decentralized Learning with Non-IID Unlabeled Data Benefit from
Self Supervision? [51.00034621304361]
We study decentralized learning with unlabeled data through the lens of self-supervised learning (SSL)
We study the effectiveness of contrastive learning algorithms under decentralized learning settings.
arXiv Detail & Related papers (2022-10-20T01:32:41Z) - Personalization Improves Privacy-Accuracy Tradeoffs in Federated
Optimization [57.98426940386627]
We show that coordinating local learning with private centralized learning yields a generically useful and improved tradeoff between accuracy and privacy.
We illustrate our theoretical results with experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2022-02-10T20:44:44Z) - DQRE-SCnet: A novel hybrid approach for selecting users in Federated
Learning with Deep-Q-Reinforcement Learning based on Spectral Clustering [1.174402845822043]
Machine learning models based on sensitive data in the real-world promise advances in areas ranging from medical screening to disease outbreaks, agriculture, industry, defense science, and more.
In many applications, learning participant communication rounds benefit from collecting their own private data sets, teaching detailed machine learning models on the real data, and sharing the benefits of using these models.
Due to existing privacy and security concerns, most people avoid sensitive data sharing for training. Without each user demonstrating their local data to a central server, Federated Learning allows various parties to train a machine learning algorithm on their shared data jointly.
arXiv Detail & Related papers (2021-11-07T15:14:29Z) - RelaySum for Decentralized Deep Learning on Heterogeneous Data [71.36228931225362]
In decentralized machine learning, workers compute model updates on their local data.
Because the workers only communicate with few neighbors without central coordination, these updates propagate progressively over the network.
This paradigm enables distributed training on networks without all-to-all connectivity, helping to protect data privacy as well as to reduce the communication cost of distributed training in data centers.
arXiv Detail & Related papers (2021-10-08T14:55:32Z) - Federated Learning from Small Datasets [48.879172201462445]
Federated learning allows multiple parties to collaboratively train a joint model without sharing local data.
We propose a novel approach that intertwines model aggregations with permutations of local models.
The permutations expose each local model to a daisy chain of local datasets resulting in more efficient training in data-sparse domains.
arXiv Detail & Related papers (2021-10-07T13:49:23Z) - Exploiting Shared Representations for Personalized Federated Learning [54.65133770989836]
We propose a novel federated learning framework and algorithm for learning a shared data representation across clients and unique local heads for each client.
Our algorithm harnesses the distributed computational power across clients to perform many local-updates with respect to the low-dimensional local parameters for every update of the representation.
This result is of interest beyond federated learning to a broad class of problems in which we aim to learn a shared low-dimensional representation among data distributions.
arXiv Detail & Related papers (2021-02-14T05:36:25Z) - FedOCR: Communication-Efficient Federated Learning for Scene Text
Recognition [76.26472513160425]
We study how to make use of decentralized datasets for training a robust scene text recognizer.
To make FedOCR fairly suitable to be deployed on end devices, we make two improvements including using lightweight models and hashing techniques.
arXiv Detail & Related papers (2020-07-22T14:30:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.