Related papers: OCTOPUS: Overcoming Performance andPrivatization Bottlenecks in Distributed Learning

OCTOPUS: Overcoming Performance andPrivatization Bottlenecks in Distributed Learning

URL: http://arxiv.org/abs/2105.00602v1
Date: Mon, 3 May 2021 02:24:53 GMT
Title: OCTOPUS: Overcoming Performance andPrivatization Bottlenecks in Distributed Learning
Authors: Shuo Wang, Surya Nepal, Kristen Moore, Marthie Grobler, Carsten Rudolph, Alsharif Abuadbba
Abstract summary: Federated learning enables distributed participants to collaboratively learn a commonly-shared model while holding data locally. We introduce a new distributed learning scheme to address communication overhead via latent compression. We show that downstream tasks on the compact latent representations can achieve comparable accuracy to centralized learning.
Score: 16.98452728773235
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The diversity and quantity of the data warehousing, gathering data from distributed devices such as mobile phones, can enhance machine learning algorithms' success and robustness. Federated learning enables distributed participants to collaboratively learn a commonly-shared model while holding data locally. However, it is also faced with expensive communication and limitations due to the heterogeneity of distributed data sources and lack of access to global data. In this paper, we investigate a practical distributed learning scenario where multiple downstream tasks (e.g., classifiers) could be learned from dynamically-updated and non-iid distributed data sources, efficiently and providing local privatization. We introduce a new distributed learning scheme to address communication overhead via latent compression, leveraging global data while providing local privatization of local data without additional cost due to encryption or perturbation. This scheme divides the learning into (1) informative feature encoding, extracting and transmitting the latent space compressed representation features of local data at each node to address communication overhead; (2) downstream tasks centralized at the server using the encoded codes gathered from each node to address computing and storage overhead. Besides, a disentanglement strategy is applied to address the privatization of sensitive components of local data. Extensive experiments are conducted on image and speech datasets. The results demonstrate that downstream tasks on the compact latent representations can achieve comparable accuracy to centralized learning with the privatization of local data.

Related papers

Federated Impression for Learning with Distributed Heterogeneous Data [19.50235109938016]
Federated learning (FL) provides a paradigm that can learn from distributed datasets across clients without requiring them to share data. In FL, sub-optimal convergence is common among data from different health centers due to the variety in data collection protocols and patient demographics across centers. We propose FedImpres which alleviates catastrophic forgetting by restoring synthetic data that represents the global information as federated impression.
arXiv Detail & Related papers (2024-09-11T15:37:52Z)
One-Shot Collaborative Data Distillation [9.428116807615407]
Large machine-learning training datasets can be distilled into small collections of informative synthetic data samples. These synthetic sets support efficient model learning and reduce the communication cost of data sharing. A naive way to construct a synthetic set in a distributed environment is to allow each client to perform local data distillation and to merge local distillations at a central server. We introduce the first collaborative data distillation technique, called CollabDM, which captures the global distribution of the data and requires only a single round of communication between client and server.
arXiv Detail & Related papers (2024-08-05T06:47:32Z)
Benchmarking FedAvg and FedCurv for Image Classification Tasks [1.376408511310322]
This paper focuses on the problem of statistical heterogeneity of the data in the same federated network. Several Federated Learning algorithms, such as FedAvg, FedProx and Federated Curvature (FedCurv) have already been proposed. As a side product of this work, we release the non-IID version of the datasets we used so to facilitate further comparisons from the FL community.
arXiv Detail & Related papers (2023-03-31T10:13:01Z)
Does Decentralized Learning with Non-IID Unlabeled Data Benefit from Self Supervision? [51.00034621304361]
We study decentralized learning with unlabeled data through the lens of self-supervised learning (SSL) We study the effectiveness of contrastive learning algorithms under decentralized learning settings.
arXiv Detail & Related papers (2022-10-20T01:32:41Z)
Personalization Improves Privacy-Accuracy Tradeoffs in Federated Optimization [57.98426940386627]
We show that coordinating local learning with private centralized learning yields a generically useful and improved tradeoff between accuracy and privacy. We illustrate our theoretical results with experiments on synthetic and real-world datasets.
arXiv Detail & Related papers (2022-02-10T20:44:44Z)
DQRE-SCnet: A novel hybrid approach for selecting users in Federated Learning with Deep-Q-Reinforcement Learning based on Spectral Clustering [1.174402845822043]
Machine learning models based on sensitive data in the real-world promise advances in areas ranging from medical screening to disease outbreaks, agriculture, industry, defense science, and more. In many applications, learning participant communication rounds benefit from collecting their own private data sets, teaching detailed machine learning models on the real data, and sharing the benefits of using these models. Due to existing privacy and security concerns, most people avoid sensitive data sharing for training. Without each user demonstrating their local data to a central server, Federated Learning allows various parties to train a machine learning algorithm on their shared data jointly.
arXiv Detail & Related papers (2021-11-07T15:14:29Z)
RelaySum for Decentralized Deep Learning on Heterogeneous Data [71.36228931225362]
In decentralized machine learning, workers compute model updates on their local data. Because the workers only communicate with few neighbors without central coordination, these updates propagate progressively over the network. This paradigm enables distributed training on networks without all-to-all connectivity, helping to protect data privacy as well as to reduce the communication cost of distributed training in data centers.
arXiv Detail & Related papers (2021-10-08T14:55:32Z)
Federated Learning from Small Datasets [48.879172201462445]
Federated learning allows multiple parties to collaboratively train a joint model without sharing local data. We propose a novel approach that intertwines model aggregations with permutations of local models. The permutations expose each local model to a daisy chain of local datasets resulting in more efficient training in data-sparse domains.
arXiv Detail & Related papers (2021-10-07T13:49:23Z)
Exploiting Shared Representations for Personalized Federated Learning [54.65133770989836]
We propose a novel federated learning framework and algorithm for learning a shared data representation across clients and unique local heads for each client. Our algorithm harnesses the distributed computational power across clients to perform many local-updates with respect to the low-dimensional local parameters for every update of the representation. This result is of interest beyond federated learning to a broad class of problems in which we aim to learn a shared low-dimensional representation among data distributions.
arXiv Detail & Related papers (2021-02-14T05:36:25Z)
FedOCR: Communication-Efficient Federated Learning for Scene Text Recognition [76.26472513160425]
We study how to make use of decentralized datasets for training a robust scene text recognizer. To make FedOCR fairly suitable to be deployed on end devices, we make two improvements including using lightweight models and hashing techniques.
arXiv Detail & Related papers (2020-07-22T14:30:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.