Related papers: SoK: Training Machine Learning Models over Multiple Sources with Privacy Preservation

SoK: Training Machine Learning Models over Multiple Sources with Privacy Preservation

URL: http://arxiv.org/abs/2012.03386v1
Date: Sun, 6 Dec 2020 22:24:28 GMT
Title: SoK: Training Machine Learning Models over Multiple Sources with Privacy Preservation
Authors: Lushan Song, Haoqi Wu, Wenqiang Ruan, Weili Han
Abstract summary: High-quality training data from multiple data controllers with privacy preservation is a key challenge to train high-quality machine learning models. Both academia researchers and industrial vendors are strongly motivated to propose two main-stream folders of solutions: 1) Secure Multi-party Learning (MPL for short); and 2) Federated Learning (FL for short) These two solutions have their advantages and limitations when we evaluate them from privacy preservation, ways of communication, communication overhead, format of data, the accuracy of trained models, and application scenarios.
Score: 1.567576360103422
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Nowadays, gathering high-quality training data from multiple data controllers with privacy preservation is a key challenge to train high-quality machine learning models. The potential solutions could dramatically break the barriers among isolated data corpus, and consequently enlarge the range of data available for processing. To this end, both academia researchers and industrial vendors are recently strongly motivated to propose two main-stream folders of solutions: 1) Secure Multi-party Learning (MPL for short); and 2) Federated Learning (FL for short). These two solutions have their advantages and limitations when we evaluate them from privacy preservation, ways of communication, communication overhead, format of data, the accuracy of trained models, and application scenarios. Motivated to demonstrate the research progress and discuss the insights on the future directions, we thoroughly investigate these protocols and frameworks of both MPL and FL. At first, we define the problem of training machine learning models over multiple data sources with privacy-preserving (TMMPP for short). Then, we compare the recent studies of TMMPP from the aspects of the technical routes, parties supported, data partitioning, threat model, and supported machine learning models, to show the advantages and limitations. Next, we introduce the state-of-the-art platforms which support online training over multiple data sources. Finally, we discuss the potential directions to resolve the problem of TMMPP.

Related papers

A Comprehensive Review on Understanding the Decentralized and Collaborative Approach in Machine Learning [0.0]
The arrival of Machine Learning (ML) completely changed how we can unlock valuable information from data. Traditional methods, where everything was stored in one place, had big problems with keeping information private, handling large amounts of data, and avoiding unfair advantages. We looked at decentralized Machine Learning and its benefits, like keeping data private, getting answers faster, and using a wider variety of data sources. Real-world examples from healthcare and finance were used to show how collaborative Machine Learning can solve important problems while still protecting information security.
arXiv Detail & Related papers (2025-03-12T20:54:22Z)
Federated Large Language Models: Current Progress and Future Directions [63.68614548512534]
This paper surveys Federated learning for LLMs (FedLLM), highlighting recent advances and future directions. We focus on two key aspects: fine-tuning and prompt learning in a federated setting, discussing existing work and associated research challenges.
arXiv Detail & Related papers (2024-09-24T04:14:33Z)
Federated Learning driven Large Language Models for Swarm Intelligence: A Survey [2.769238399659845]
Federated learning (FL) offers a compelling framework for training large language models (LLMs) We focus on machine unlearning, a crucial aspect for complying with privacy regulations like the Right to be Forgotten. We explore various strategies that enable effective unlearning, such as perturbation techniques, model decomposition, and incremental learning.
arXiv Detail & Related papers (2024-06-14T08:40:58Z)
The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements. LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information. Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z)
Rethinking Privacy in Machine Learning Pipelines from an Information Flow Control Perspective [16.487545258246932]
Modern machine learning systems use models trained on ever-growing corpora. metadata such as ownership, access control, or licensing information is ignored during training. We take an information flow control perspective to describe machine learning systems.
arXiv Detail & Related papers (2023-11-27T13:14:39Z)
Zero-knowledge Proof Meets Machine Learning in Verifiability: A Survey [19.70499936572449]
High-quality models rely not only on efficient optimization algorithms but also on the training and learning processes built upon vast amounts of data and computational power. Due to various challenges such as limited computational resources and data privacy concerns, users in need of models often cannot train machine learning models locally. This paper presents a comprehensive survey of zero-knowledge proof-based verifiable machine learning (ZKP-VML) technology.
arXiv Detail & Related papers (2023-10-23T12:15:23Z)
Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly [62.473245910234304]
This paper takes a hardware-centric approach to explore how Large Language Models can be brought to modern edge computing systems. We provide a micro-level hardware benchmark, compare the model FLOP utilization to a state-of-the-art data center GPU, and study the network utilization in realistic conditions.
arXiv Detail & Related papers (2023-10-04T20:27:20Z)
Privacy Side Channels in Machine Learning Systems [87.53240071195168]
We introduce privacy side channels: attacks that exploit system-level components to extract private information. For example, we show that deduplicating training data before applying differentially-private training creates a side-channel that completely invalidates any provable privacy guarantees. We further show that systems which block language models from regenerating training data can be exploited to exfiltrate private keys contained in the training set.
arXiv Detail & Related papers (2023-09-11T16:49:05Z)
Deep Reinforcement Learning Assisted Federated Learning Algorithm for Data Management of IIoT [82.33080550378068]
The continuous expanded scale of the industrial Internet of Things (IIoT) leads to IIoT equipments generating massive amounts of user data every moment. How to manage these time series data in an efficient and safe way in the field of IIoT is still an open issue. This paper studies the FL technology applications to manage IIoT equipment data in wireless network environments.
arXiv Detail & Related papers (2022-02-03T07:12:36Z)
Decentralized Federated Learning Preserves Model and Data Privacy [77.454688257702]
We propose a fully decentralized approach, which allows to share knowledge between trained models. Students are trained on the output of their teachers via synthetically generated input data. The results show that an untrained student model, trained on the teachers output reaches comparable F1-scores as the teacher.
arXiv Detail & Related papers (2021-02-01T14:38:54Z)
Fusion of Federated Learning and Industrial Internet of Things: A Survey [4.810675235074399]
Industrial Internet of Things (IIoT) lays a new paradigm for the concept of Industry 4.0 and paves an insight for new industrial era. Smart machines and smart factories use machine learning/deep learning based models for incurring intelligence. In order to address this issue, federated learning (FL) technology is implemented in IIoT by the researchers nowadays to provide safe, accurate, robust and unbiased models.
arXiv Detail & Related papers (2021-01-04T06:28:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.