SoK: Training Machine Learning Models over Multiple Sources with Privacy
Preservation
- URL: http://arxiv.org/abs/2012.03386v1
- Date: Sun, 6 Dec 2020 22:24:28 GMT
- Title: SoK: Training Machine Learning Models over Multiple Sources with Privacy
Preservation
- Authors: Lushan Song, Haoqi Wu, Wenqiang Ruan, Weili Han
- Abstract summary: High-quality training data from multiple data controllers with privacy preservation is a key challenge to train high-quality machine learning models.
Both academia researchers and industrial vendors are strongly motivated to propose two main-stream folders of solutions: 1) Secure Multi-party Learning (MPL for short); and 2) Federated Learning (FL for short)
These two solutions have their advantages and limitations when we evaluate them from privacy preservation, ways of communication, communication overhead, format of data, the accuracy of trained models, and application scenarios.
- Score: 1.567576360103422
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nowadays, gathering high-quality training data from multiple data controllers
with privacy preservation is a key challenge to train high-quality machine
learning models. The potential solutions could dramatically break the barriers
among isolated data corpus, and consequently enlarge the range of data
available for processing. To this end, both academia researchers and industrial
vendors are recently strongly motivated to propose two main-stream folders of
solutions: 1) Secure Multi-party Learning (MPL for short); and 2) Federated
Learning (FL for short). These two solutions have their advantages and
limitations when we evaluate them from privacy preservation, ways of
communication, communication overhead, format of data, the accuracy of trained
models, and application scenarios.
Motivated to demonstrate the research progress and discuss the insights on
the future directions, we thoroughly investigate these protocols and frameworks
of both MPL and FL. At first, we define the problem of training machine
learning models over multiple data sources with privacy-preserving (TMMPP for
short). Then, we compare the recent studies of TMMPP from the aspects of the
technical routes, parties supported, data partitioning, threat model, and
supported machine learning models, to show the advantages and limitations.
Next, we introduce the state-of-the-art platforms which support online training
over multiple data sources. Finally, we discuss the potential directions to
resolve the problem of TMMPP.
Related papers
- Federated Large Language Models: Current Progress and Future Directions [63.68614548512534]
This paper surveys Federated learning for LLMs (FedLLM), highlighting recent advances and future directions.
We focus on two key aspects: fine-tuning and prompt learning in a federated setting, discussing existing work and associated research challenges.
arXiv Detail & Related papers (2024-09-24T04:14:33Z) - Federated Learning driven Large Language Models for Swarm Intelligence: A Survey [2.769238399659845]
Federated learning (FL) offers a compelling framework for training large language models (LLMs)
We focus on machine unlearning, a crucial aspect for complying with privacy regulations like the Right to be Forgotten.
We explore various strategies that enable effective unlearning, such as perturbation techniques, model decomposition, and incremental learning.
arXiv Detail & Related papers (2024-06-14T08:40:58Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - Rethinking Privacy in Machine Learning Pipelines from an Information
Flow Control Perspective [16.487545258246932]
Modern machine learning systems use models trained on ever-growing corpora.
metadata such as ownership, access control, or licensing information is ignored during training.
We take an information flow control perspective to describe machine learning systems.
arXiv Detail & Related papers (2023-11-27T13:14:39Z) - Zero-knowledge Proof Meets Machine Learning in Verifiability: A Survey [19.70499936572449]
High-quality models rely not only on efficient optimization algorithms but also on the training and learning processes built upon vast amounts of data and computational power.
Due to various challenges such as limited computational resources and data privacy concerns, users in need of models often cannot train machine learning models locally.
This paper presents a comprehensive survey of zero-knowledge proof-based verifiable machine learning (ZKP-VML) technology.
arXiv Detail & Related papers (2023-10-23T12:15:23Z) - Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly [62.473245910234304]
This paper takes a hardware-centric approach to explore how Large Language Models can be brought to modern edge computing systems.
We provide a micro-level hardware benchmark, compare the model FLOP utilization to a state-of-the-art data center GPU, and study the network utilization in realistic conditions.
arXiv Detail & Related papers (2023-10-04T20:27:20Z) - Privacy Side Channels in Machine Learning Systems [87.53240071195168]
We introduce privacy side channels: attacks that exploit system-level components to extract private information.
For example, we show that deduplicating training data before applying differentially-private training creates a side-channel that completely invalidates any provable privacy guarantees.
We further show that systems which block language models from regenerating training data can be exploited to exfiltrate private keys contained in the training set.
arXiv Detail & Related papers (2023-09-11T16:49:05Z) - Deep Reinforcement Learning Assisted Federated Learning Algorithm for
Data Management of IIoT [82.33080550378068]
The continuous expanded scale of the industrial Internet of Things (IIoT) leads to IIoT equipments generating massive amounts of user data every moment.
How to manage these time series data in an efficient and safe way in the field of IIoT is still an open issue.
This paper studies the FL technology applications to manage IIoT equipment data in wireless network environments.
arXiv Detail & Related papers (2022-02-03T07:12:36Z) - Decentralized Federated Learning Preserves Model and Data Privacy [77.454688257702]
We propose a fully decentralized approach, which allows to share knowledge between trained models.
Students are trained on the output of their teachers via synthetically generated input data.
The results show that an untrained student model, trained on the teachers output reaches comparable F1-scores as the teacher.
arXiv Detail & Related papers (2021-02-01T14:38:54Z) - Fusion of Federated Learning and Industrial Internet of Things: A Survey [4.810675235074399]
Industrial Internet of Things (IIoT) lays a new paradigm for the concept of Industry 4.0 and paves an insight for new industrial era.
Smart machines and smart factories use machine learning/deep learning based models for incurring intelligence.
In order to address this issue, federated learning (FL) technology is implemented in IIoT by the researchers nowadays to provide safe, accurate, robust and unbiased models.
arXiv Detail & Related papers (2021-01-04T06:28:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.