Citadel: Protecting Data Privacy and Model Confidentiality for
Collaborative Learning with SGX
- URL: http://arxiv.org/abs/2105.01281v1
- Date: Tue, 4 May 2021 04:17:29 GMT
- Title: Citadel: Protecting Data Privacy and Model Confidentiality for
Collaborative Learning with SGX
- Authors: Chengliang Zhang, Junzhe Xia, Baichen Yang, Huancheng Puyang, Wei
Wang, Ruichuan Chen, Istemi Ekin Akkus, Paarijaat Aditya, Feng Yan
- Abstract summary: This paper presents Citadel, a scalable collaborative ML system that protects the privacy of both data owner and model owner in untrusted infrastructures.
C Citadel performs distributed training across multiple training enclaves running on behalf of data owners and an aggregator enclave on behalf of the model owner.
Compared with the existing SGX-protected training systems, Citadel enables better scalability and stronger privacy guarantees for collaborative ML.
- Score: 5.148111464782033
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the advancement of machine learning (ML) and its growing awareness, many
organizations who own data but not ML expertise (data owner) would like to pool
their data and collaborate with those who have expertise but need data from
diverse sources to train truly generalizable models (model owner). In such
collaborative ML, the data owner wants to protect the privacy of its training
data, while the model owner desires the confidentiality of the model and the
training method which may contain intellectual properties. However, existing
private ML solutions, such as federated learning and split learning, cannot
meet the privacy requirements of both data and model owners at the same time.
This paper presents Citadel, a scalable collaborative ML system that protects
the privacy of both data owner and model owner in untrusted infrastructures
with the help of Intel SGX. Citadel performs distributed training across
multiple training enclaves running on behalf of data owners and an aggregator
enclave on behalf of the model owner. Citadel further establishes a strong
information barrier between these enclaves by means of zero-sum masking and
hierarchical aggregation to prevent data/model leakage during collaborative
training. Compared with the existing SGX-protected training systems, Citadel
enables better scalability and stronger privacy guarantees for collaborative
ML. Cloud deployment with various ML models shows that Citadel scales to a
large number of enclaves with less than 1.73X slowdown caused by SGX.
Related papers
- FT-PrivacyScore: Personalized Privacy Scoring Service for Machine Learning Participation [4.772368796656325]
In practice, controlled data access remains a mainstream method for protecting data privacy in many industrial and research environments.
We developed the demo prototype FT-PrivacyScore to show that it's possible to efficiently and quantitatively estimate the privacy risk of participating in a model fine-tuning task.
arXiv Detail & Related papers (2024-10-30T02:41:26Z) - KnowledgeSG: Privacy-Preserving Synthetic Text Generation with Knowledge Distillation from Server [48.04903443425111]
Large language models (LLMs) facilitate many parties to fine-tune LLMs on their own private data.
Existing solutions, such as utilizing synthetic data for substitution, struggle to simultaneously improve performance and preserve privacy.
We propose KnowledgeSG, a novel client-server framework which enhances synthetic data quality and improves model performance while ensuring privacy.
arXiv Detail & Related papers (2024-10-08T06:42:28Z) - Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners.
FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks.
We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z) - Decentralised, Collaborative, and Privacy-preserving Machine Learning for Multi-Hospital Data [31.106733834322394]
We propose Decentralized, Collaborative, and Privacy-preserving ML for Multi-Hospital Data (DeCaPH)
We demonstrate the generalizability and power of DeCaPH on three distinct tasks using real-world distributed medical datasets.
arXiv Detail & Related papers (2024-01-31T22:06:10Z) - Personalized Federated Learning with Attention-based Client Selection [57.71009302168411]
We propose FedACS, a new PFL algorithm with an Attention-based Client Selection mechanism.
FedACS integrates an attention mechanism to enhance collaboration among clients with similar data distributions.
Experiments on CIFAR10 and FMNIST validate FedACS's superiority.
arXiv Detail & Related papers (2023-12-23T03:31:46Z) - Scalable Collaborative Learning via Representation Sharing [53.047460465980144]
Federated learning (FL) and Split Learning (SL) are two frameworks that enable collaborative learning while keeping the data private (on device)
In FL, each data holder trains a model locally and releases it to a central server for aggregation.
In SL, the clients must release individual cut-layer activations (smashed data) to the server and wait for its response (during both inference and back propagation).
In this work, we present a novel approach for privacy-preserving machine learning, where the clients collaborate via online knowledge distillation using a contrastive loss.
arXiv Detail & Related papers (2022-11-20T10:49:22Z) - Privacy-Preserving Machine Learning for Collaborative Data Sharing via
Auto-encoder Latent Space Embeddings [57.45332961252628]
Privacy-preserving machine learning in data-sharing processes is an ever-critical task.
This paper presents an innovative framework that uses Representation Learning via autoencoders to generate privacy-preserving embedded data.
arXiv Detail & Related papers (2022-11-10T17:36:58Z) - CaPC Learning: Confidential and Private Collaborative Learning [30.403853588224987]
We introduce Confidential and Private Collaborative (CaPC) learning, the first method provably achieving both confidentiality and privacy in a collaborative setting.
We demonstrate how CaPC allows participants to collaborate without having to explicitly join their training sets or train a central model.
arXiv Detail & Related papers (2021-02-09T23:50:24Z) - Differentially Private Secure Multi-Party Computation for Federated
Learning in Financial Applications [5.50791468454604]
Federated learning enables a population of clients, working with a trusted server, to collaboratively learn a shared machine learning model.
This reduces the risk of exposing sensitive data, but it is still possible to reverse engineer information about a client's private data set from communicated model parameters.
We present a privacy-preserving federated learning protocol to a non-specialist audience, demonstrate it using logistic regression on a real-world credit card fraud data set, and evaluate it using an open-source simulation platform.
arXiv Detail & Related papers (2020-10-12T17:16:27Z) - When Machine Unlearning Jeopardizes Privacy [25.167214892258567]
We investigate the unintended information leakage caused by machine unlearning.
We propose a novel membership inference attack that achieves strong performance.
Our results can help improve privacy protection in practical implementations of machine unlearning.
arXiv Detail & Related papers (2020-05-05T14:11:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.