Related papers: Evaluating Privacy Leakage in Split Learning

Evaluating Privacy Leakage in Split Learning

URL: http://arxiv.org/abs/2305.12997v3
Date: Fri, 19 Jan 2024 20:35:54 GMT
Title: Evaluating Privacy Leakage in Split Learning
Authors: Xinchi Qiu, Ilias Leontiadis, Luca Melis, Alex Sablayrolles, Pierre Stock
Abstract summary: On-device machine learning allows us to avoid sharing raw data with a third-party server during inference. Split Learning (SL) is a promising approach that can overcome limitations. In SL, a large machine learning model is divided into two parts, with the bigger part residing on the server side and a smaller part executing on-device.
Score: 8.841387955312669
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Privacy-Preserving machine learning (PPML) can help us train and deploy models that utilize private information. In particular, on-device machine learning allows us to avoid sharing raw data with a third-party server during inference. On-device models are typically less accurate when compared to their server counterparts due to the fact that (1) they typically only rely on a small set of on-device features and (2) they need to be small enough to run efficiently on end-user devices. Split Learning (SL) is a promising approach that can overcome these limitations. In SL, a large machine learning model is divided into two parts, with the bigger part residing on the server side and a smaller part executing on-device, aiming to incorporate the private features. However, end-to-end training of such models requires exchanging gradients at the cut layer, which might encode private features or labels. In this paper, we provide insights into potential privacy risks associated with SL. Furthermore, we also investigate the effectiveness of various mitigation strategies. Our results indicate that the gradients significantly improve the attackers' effectiveness in all tested datasets reaching almost perfect reconstruction accuracy for some features. However, a small amount of differential privacy (DP) can effectively mitigate this risk without causing significant training degradation.

Related papers

RLSA-PFL: Robust Lightweight Secure Aggregation with Model Inconsistency Detection in Privacy-Preserving Federated Learning [12.804623314091508]
Federated Learning (FL) allows users to collaboratively train a global machine learning model by sharing local model only, without exposing their private data to a central server. Study have revealed privacy vulnerabilities in FL, where adversaries can potentially infer sensitive information from the shared model parameters. We present an efficient masking-based secure aggregation scheme utilizing lightweight cryptographic primitives to privacy risks.
arXiv Detail & Related papers (2025-02-13T06:01:09Z)
Label Privacy in Split Learning for Large Models with Parameter-Efficient Training [51.28799334394279]
We search for a way to fine-tune models over an API while keeping the labels private. We propose P$3$EFT, a multi-party split learning algorithm that takes advantage of existing PEFT properties to maintain privacy at a lower performance overhead.
arXiv Detail & Related papers (2024-12-21T15:32:03Z)
Love or Hate? Share or Split? Privacy-Preserving Training Using Split Learning and Homomorphic Encryption [47.86010265348072]
Split learning (SL) is a new collaborative learning technique that allows participants to train machine learning models without the client sharing raw data. Previous works demonstrated that reconstructing activation maps could result in privacy leakage of client data. In this paper, we improve upon previous works by constructing a protocol based on U-shaped SL that can operate on homomorphically encrypted data.
arXiv Detail & Related papers (2023-09-19T10:56:08Z)
Can Public Large Language Models Help Private Cross-device Federated Learning? [58.05449579773249]
We study (differentially) private federated learning (FL) of language models. Public data has been used to improve privacy-utility trade-offs for both large and small language models. We propose a novel distribution matching algorithm with theoretical grounding to sample public data close to private data distribution.
arXiv Detail & Related papers (2023-05-20T07:55:58Z)
Split Ways: Privacy-Preserving Training of Encrypted Data Using Split Learning [6.916134299626706]
Split Learning (SL) is a new collaborative learning technique that allows participants to train machine learning models without the client sharing raw data. Previous works demonstrated that reconstructing activation maps could result in privacy leakage of client data. In this paper, we improve upon previous works by constructing a protocol based on U-shaped SL that can operate on homomorphically encrypted data.
arXiv Detail & Related papers (2023-01-20T19:26:51Z)
Dual Learning for Large Vocabulary On-Device ASR [64.10124092250128]
Dual learning is a paradigm for semi-supervised machine learning that seeks to leverage unsupervised data by solving two opposite tasks at once. We provide an analysis of an on-device-sized streaming conformer trained on the entirety of Librispeech, showing relative WER improvements of 10.7%/5.2% without an LM and 11.7%/16.4% with an LM.
arXiv Detail & Related papers (2023-01-11T06:32:28Z)
Exploring the Limits of Differentially Private Deep Learning with Group-wise Clipping [91.60608388479645]
We show that emphper-layer clipping allows clipping to be performed in conjunction with backpropagation in differentially private optimization. This results in private learning that is as memory-efficient and almost as fast per training update as non-private learning for many of interest.
arXiv Detail & Related papers (2022-12-03T05:20:15Z)
Scalable Collaborative Learning via Representation Sharing [53.047460465980144]
Federated learning (FL) and Split Learning (SL) are two frameworks that enable collaborative learning while keeping the data private (on device) In FL, each data holder trains a model locally and releases it to a central server for aggregation. In SL, the clients must release individual cut-layer activations (smashed data) to the server and wait for its response (during both inference and back propagation). In this work, we present a novel approach for privacy-preserving machine learning, where the clients collaborate via online knowledge distillation using a contrastive loss.
arXiv Detail & Related papers (2022-11-20T10:49:22Z)
Federated Split GANs [12.007429155505767]
We propose an alternative approach to train ML models in user's devices themselves. We focus on GANs (generative adversarial networks) and leverage their inherent privacy-preserving attribute. Our system preserves data privacy, keeps a short training time, and yields same accuracy of model training in unconstrained devices.
arXiv Detail & Related papers (2022-07-04T23:53:47Z)
Binarizing Split Learning for Data Privacy Enhancement and Computation Reduction [8.40552206158625]
Split learning (SL) enables data privacy preservation by allowing clients to collaboratively train a deep learning model with the server without sharing raw data. In this study, we propose to binarize the SL local layers for faster computation and reduced memory usage. Our results have demonstrated B-SL models are promising for lightweight IoT/mobile applications with high privacy-preservation requirements.
arXiv Detail & Related papers (2022-06-10T04:07:02Z)
Truth Serum: Poisoning Machine Learning Models to Reveal Their Secrets [53.866927712193416]
We show that an adversary who can poison a training dataset can cause models trained on this dataset to leak private details belonging to other parties. Our attacks are effective across membership inference, attribute inference, and data extraction. Our results cast doubts on the relevance of cryptographic privacy guarantees in multiparty protocols for machine learning.
arXiv Detail & Related papers (2022-03-31T18:06:28Z)
Constrained Differentially Private Federated Learning for Low-bandwidth Devices [1.1470070927586016]
This paper presents a novel privacy-preserving federated learning scheme. It provides theoretical privacy guarantees, as it is based on Differential Privacy. It reduces the upstream and downstream bandwidth by up to 99.9% compared to standard federated learning.
arXiv Detail & Related papers (2021-02-27T22:25:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.