Outsourcing Training without Uploading Data via Efficient Collaborative
Open-Source Sampling
- URL: http://arxiv.org/abs/2210.12575v1
- Date: Sun, 23 Oct 2022 00:12:18 GMT
- Title: Outsourcing Training without Uploading Data via Efficient Collaborative
Open-Source Sampling
- Authors: Junyuan Hong, Lingjuan Lyu, Jiayu Zhou, Michael Spranger
- Abstract summary: Traditional outsourcing requires uploading device data to the cloud server.
We propose to leverage widely available open-source data, which is a massive dataset collected from public and heterogeneous sources.
We develop a novel strategy called Efficient Collaborative Open-source Sampling (ECOS) to construct a proximal proxy dataset from open-source data for cloud training.
- Score: 49.87637449243698
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As deep learning blooms with growing demand for computation and data
resources, outsourcing model training to a powerful cloud server becomes an
attractive alternative to training at a low-power and cost-effective end
device. Traditional outsourcing requires uploading device data to the cloud
server, which can be infeasible in many real-world applications due to the
often sensitive nature of the collected data and the limited communication
bandwidth. To tackle these challenges, we propose to leverage widely available
open-source data, which is a massive dataset collected from public and
heterogeneous sources (e.g., Internet images). We develop a novel strategy
called Efficient Collaborative Open-source Sampling (ECOS) to construct a
proximal proxy dataset from open-source data for cloud training, in lieu of
client data. ECOS probes open-source data on the cloud server to sense the
distribution of client data via a communication- and computation-efficient
sampling process, which only communicates a few compressed public features and
client scalar responses. Extensive empirical studies show that the proposed
ECOS improves the quality of automated client labeling, model compression, and
label outsourcing when applied in various learning scenarios.
Related papers
- CollaFuse: Navigating Limited Resources and Privacy in Collaborative
Generative AI [4.062316786853382]
CollaFuse is a novel framework inspired by split learning.
It enables shared server training and inference, alleviating client computational burdens.
It has the potential to impact various application areas, such as the design of edge computing solutions, healthcare research, or autonomous driving.
arXiv Detail & Related papers (2024-02-29T12:36:10Z) - HePCo: Data-Free Heterogeneous Prompt Consolidation for Continual
Federated Learning [21.639199127980508]
We focus on the important yet understudied problem of Continual Federated Learning (CFL)
CFL is where a server communicates with a set of clients to incrementally learn new concepts without sharing or storing any data.
We propose a novel and lightweight generation and distillation scheme to consolidate client models at the server.
arXiv Detail & Related papers (2023-06-16T17:02:12Z) - STAR: Boosting Low-Resource Information Extraction by Structure-to-Text
Data Generation with Large Language Models [56.27786433792638]
STAR is a data generation method that leverages Large Language Models (LLMs) to synthesize data instances.
We design fine-grained step-by-step instructions to obtain the initial data instances.
Our experiments show that the data generated by STAR significantly improve the performance of low-resource event extraction and relation extraction tasks.
arXiv Detail & Related papers (2023-05-24T12:15:19Z) - Exploring One-shot Semi-supervised Federated Learning with A Pre-trained Diffusion Model [40.83058938096914]
We propose FedDISC, a Federated Diffusion-Inspired Semi-supervised Co-training method.
We first extract prototypes of the labeled server data and use these prototypes to predict pseudo-labels of the client data.
For each category, we compute the cluster centroids and domain-specific representations to signify the semantic and stylistic information of their distributions.
These representations are sent back to the server, which uses the pre-trained to generate synthetic datasets complying with the client distributions and train a global model on it.
arXiv Detail & Related papers (2023-05-06T14:22:33Z) - FedNet2Net: Saving Communication and Computations in Federated Learning
with Model Growing [0.0]
Federated learning (FL) is a recently developed area of machine learning.
In this paper, a novel scheme based on the notion of "model growing" is proposed.
The proposed approach is tested extensively on three standard benchmarks and is shown to achieve substantial reduction in communication and client computation.
arXiv Detail & Related papers (2022-07-19T21:54:53Z) - Scalable Neural Data Server: A Data Recommender for Transfer Learning [70.06289658553675]
Transfer learning is a popular strategy for leveraging additional data to improve the downstream performance.
Nerve Data Server (NDS), a search engine that recommends relevant data for a given downstream task, has been previously proposed to address this problem.
NDS uses a mixture of experts trained on data sources to estimate similarity between each source and the downstream task.
SNDS represents both data sources and downstream tasks by their proximity to the intermediary datasets.
arXiv Detail & Related papers (2022-06-19T12:07:32Z) - Data Selection for Efficient Model Update in Federated Learning [0.07614628596146598]
We propose to reduce the amount of local data that is needed to train a global model.
We do this by splitting the model into a lower part for generic feature extraction and an upper part that is more sensitive to the characteristics of the local data.
Our experiments show that less than 1% of the local data can transfer the characteristics of the client data to the global model.
arXiv Detail & Related papers (2021-11-05T14:07:06Z) - Deep Transfer Learning for Multi-source Entity Linkage via Domain
Adaptation [63.24594955429465]
Multi-source entity linkage is critical in high-impact applications such as data cleaning and user stitching.
AdaMEL is a deep transfer learning framework that learns generic high-level knowledge to perform multi-source entity linkage.
Our framework achieves state-of-the-art results with 8.21% improvement on average over methods based on supervised learning.
arXiv Detail & Related papers (2021-10-27T15:20:41Z) - Federated Multi-Target Domain Adaptation [99.93375364579484]
Federated learning methods enable us to train machine learning models on distributed user data while preserving its privacy.
We consider a more practical scenario where the distributed client data is unlabeled, and a centralized labeled dataset is available on the server.
We propose an effective DualAdapt method to address the new challenges.
arXiv Detail & Related papers (2021-08-17T17:53:05Z) - Multi-modal AsynDGAN: Learn From Distributed Medical Image Data without
Sharing Private Information [55.866673486753115]
We propose an extendable and elastic learning framework to preserve privacy and security.
The proposed framework is named distributed Asynchronized Discriminator Generative Adrial Networks (AsynDGAN)
arXiv Detail & Related papers (2020-12-15T20:41:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.