Evaluating Privacy Leakage in Split Learning
- URL: http://arxiv.org/abs/2305.12997v3
- Date: Fri, 19 Jan 2024 20:35:54 GMT
- Title: Evaluating Privacy Leakage in Split Learning
- Authors: Xinchi Qiu, Ilias Leontiadis, Luca Melis, Alex Sablayrolles, Pierre
Stock
- Abstract summary: On-device machine learning allows us to avoid sharing raw data with a third-party server during inference.
Split Learning (SL) is a promising approach that can overcome limitations.
In SL, a large machine learning model is divided into two parts, with the bigger part residing on the server side and a smaller part executing on-device.
- Score: 8.841387955312669
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Privacy-Preserving machine learning (PPML) can help us train and deploy
models that utilize private information. In particular, on-device machine
learning allows us to avoid sharing raw data with a third-party server during
inference. On-device models are typically less accurate when compared to their
server counterparts due to the fact that (1) they typically only rely on a
small set of on-device features and (2) they need to be small enough to run
efficiently on end-user devices. Split Learning (SL) is a promising approach
that can overcome these limitations. In SL, a large machine learning model is
divided into two parts, with the bigger part residing on the server side and a
smaller part executing on-device, aiming to incorporate the private features.
However, end-to-end training of such models requires exchanging gradients at
the cut layer, which might encode private features or labels. In this paper, we
provide insights into potential privacy risks associated with SL. Furthermore,
we also investigate the effectiveness of various mitigation strategies. Our
results indicate that the gradients significantly improve the attackers'
effectiveness in all tested datasets reaching almost perfect reconstruction
accuracy for some features. However, a small amount of differential privacy
(DP) can effectively mitigate this risk without causing significant training
degradation.
Related papers
- Love or Hate? Share or Split? Privacy-Preserving Training Using Split
Learning and Homomorphic Encryption [47.86010265348072]
Split learning (SL) is a new collaborative learning technique that allows participants to train machine learning models without the client sharing raw data.
Previous works demonstrated that reconstructing activation maps could result in privacy leakage of client data.
In this paper, we improve upon previous works by constructing a protocol based on U-shaped SL that can operate on homomorphically encrypted data.
arXiv Detail & Related papers (2023-09-19T10:56:08Z) - Can Public Large Language Models Help Private Cross-device Federated Learning? [58.05449579773249]
We study (differentially) private federated learning (FL) of language models.
Public data has been used to improve privacy-utility trade-offs for both large and small language models.
We propose a novel distribution matching algorithm with theoretical grounding to sample public data close to private data distribution.
arXiv Detail & Related papers (2023-05-20T07:55:58Z) - Split Ways: Privacy-Preserving Training of Encrypted Data Using Split
Learning [6.916134299626706]
Split Learning (SL) is a new collaborative learning technique that allows participants to train machine learning models without the client sharing raw data.
Previous works demonstrated that reconstructing activation maps could result in privacy leakage of client data.
In this paper, we improve upon previous works by constructing a protocol based on U-shaped SL that can operate on homomorphically encrypted data.
arXiv Detail & Related papers (2023-01-20T19:26:51Z) - Dual Learning for Large Vocabulary On-Device ASR [64.10124092250128]
Dual learning is a paradigm for semi-supervised machine learning that seeks to leverage unsupervised data by solving two opposite tasks at once.
We provide an analysis of an on-device-sized streaming conformer trained on the entirety of Librispeech, showing relative WER improvements of 10.7%/5.2% without an LM and 11.7%/16.4% with an LM.
arXiv Detail & Related papers (2023-01-11T06:32:28Z) - Exploring the Limits of Differentially Private Deep Learning with
Group-wise Clipping [91.60608388479645]
We show that emphper-layer clipping allows clipping to be performed in conjunction with backpropagation in differentially private optimization.
This results in private learning that is as memory-efficient and almost as fast per training update as non-private learning for many of interest.
arXiv Detail & Related papers (2022-12-03T05:20:15Z) - Scalable Collaborative Learning via Representation Sharing [53.047460465980144]
Federated learning (FL) and Split Learning (SL) are two frameworks that enable collaborative learning while keeping the data private (on device)
In FL, each data holder trains a model locally and releases it to a central server for aggregation.
In SL, the clients must release individual cut-layer activations (smashed data) to the server and wait for its response (during both inference and back propagation).
In this work, we present a novel approach for privacy-preserving machine learning, where the clients collaborate via online knowledge distillation using a contrastive loss.
arXiv Detail & Related papers (2022-11-20T10:49:22Z) - Federated Split GANs [12.007429155505767]
We propose an alternative approach to train ML models in user's devices themselves.
We focus on GANs (generative adversarial networks) and leverage their inherent privacy-preserving attribute.
Our system preserves data privacy, keeps a short training time, and yields same accuracy of model training in unconstrained devices.
arXiv Detail & Related papers (2022-07-04T23:53:47Z) - Binarizing Split Learning for Data Privacy Enhancement and Computation
Reduction [8.40552206158625]
Split learning (SL) enables data privacy preservation by allowing clients to collaboratively train a deep learning model with the server without sharing raw data.
In this study, we propose to binarize the SL local layers for faster computation and reduced memory usage.
Our results have demonstrated B-SL models are promising for lightweight IoT/mobile applications with high privacy-preservation requirements.
arXiv Detail & Related papers (2022-06-10T04:07:02Z) - Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets [53.866927712193416]
We show that an adversary who can poison a training dataset can cause models trained on this dataset to leak private details belonging to other parties.
Our attacks are effective across membership inference, attribute inference, and data extraction.
Our results cast doubts on the relevance of cryptographic privacy guarantees in multiparty protocols for machine learning.
arXiv Detail & Related papers (2022-03-31T18:06:28Z) - Constrained Differentially Private Federated Learning for Low-bandwidth
Devices [1.1470070927586016]
This paper presents a novel privacy-preserving federated learning scheme.
It provides theoretical privacy guarantees, as it is based on Differential Privacy.
It reduces the upstream and downstream bandwidth by up to 99.9% compared to standard federated learning.
arXiv Detail & Related papers (2021-02-27T22:25:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.