PSA: Private Set Alignment for Secure and Collaborative Analytics on Large-Scale Data
- URL: http://arxiv.org/abs/2410.04746v1
- Date: Mon, 7 Oct 2024 04:39:14 GMT
- Title: PSA: Private Set Alignment for Secure and Collaborative Analytics on Large-Scale Data
- Authors: Jiabo Wang, Elmo Xuyun Huang, Pu Duan, Huaxiong Wang, Kwok-Yan Lam,
- Abstract summary: Two companies expect to securely join their datasets with respect to their common customers to maximize data insights.
We proposed a solution, dubbed PSA, for this scenario, which is effectively applicable to real-world use cases.
We implemented and benchmarked the proposed protocols in different network conditions by joining two datasets, each at the scale of one million records, in 35.5 sec on a single thread with a network bandwidth of 500 Mbps.
- Score: 17.23761289654492
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Enforcement of privacy regulation is essential for collaborative data analytics. In this work, we address a scenario in which two companies expect to securely join their datasets with respect to their common customers to maximize data insights. Apart from the necessary protection of raw data, it becomes more challenging to protect the identities and attributes of common customers, as it requires participants to align their records associated with common customers without knowing who they are. We proposed a solution, dubbed PSA, for this scenario, which is effectively applicable to real-world use cases, such as evaluating advertising conversion using data from both publishers and merchants. The contributions of this work are threefold: 1. We defined the notion of PSA with two levels of privacy protection and proposed novel PSA protocols based on the modified oblivious switching network, which leverages efficient symmetric key operations and offline precomputation to save online run time. 2. We implemented and benchmarked the proposed protocols in different network conditions by joining two datasets, each at the scale of one million records, in 35.5 sec on a single thread with a network bandwidth of 500 Mbps, resulting in an X100 improvement over the existing Homomorphic based protocols. 3. We give new proof for an algorithm of quasi-linear complexity that constructs an oblivious switching network to achieve a target permutation distinct from the existing one in the literature.
Related papers
- Collaborative Inference over Wireless Channels with Feature Differential Privacy [57.68286389879283]
Collaborative inference among multiple wireless edge devices has the potential to significantly enhance Artificial Intelligence (AI) applications.
transmitting extracted features poses a significant privacy risk, as sensitive personal data can be exposed during the process.
We propose a novel privacy-preserving collaborative inference mechanism, wherein each edge device in the network secures the privacy of extracted features before transmitting them to a central server for inference.
arXiv Detail & Related papers (2024-10-25T18:11:02Z) - Using Synthetic Data to Mitigate Unfairness and Preserve Privacy in Collaborative Machine Learning [6.516872951510096]
Collaborative machine learning enables multiple clients to train a global model collaboratively.
To preserve privacy in such settings, a common technique is to utilize frequent updates and transmissions of model parameters.
We propose a two-stage strategy that promotes fair predictions, prevents client-data leakage, and reduces communication costs.
arXiv Detail & Related papers (2024-09-14T21:04:11Z) - CURE: Privacy-Preserving Split Learning Done Right [1.388112207221632]
Homomorphic encryption (HE)-based solutions exist for this scenario but often impose prohibitive computational burdens.
CURE is a novel system that encrypts only the server side of the model and the data.
We demonstrate CURE can achieve similar accuracy to plaintext SL while being 16x more efficient in terms of the runtime.
arXiv Detail & Related papers (2024-07-12T04:10:19Z) - PriRoAgg: Achieving Robust Model Aggregation with Minimum Privacy Leakage for Federated Learning [49.916365792036636]
Federated learning (FL) has recently gained significant momentum due to its potential to leverage large-scale distributed user data.
The transmitted model updates can potentially leak sensitive user information, and the lack of central control of the local training process leaves the global model susceptible to malicious manipulations on model updates.
We develop a general framework PriRoAgg, utilizing Lagrange coded computing and distributed zero-knowledge proof, to execute a wide range of robust aggregation algorithms while satisfying aggregated privacy.
arXiv Detail & Related papers (2024-07-12T03:18:08Z) - Privacy-Preserving, Dropout-Resilient Aggregation in Decentralized Learning [3.9166000694570076]
Decentralized learning (DL) offers a novel paradigm in machine learning by distributing training across clients without central aggregation.
DL's peer-to-peer model raises challenges in protecting against inference attacks and privacy leaks.
This work proposes three secret sharing-based dropout resilience approaches for privacy-preserving DL.
arXiv Detail & Related papers (2024-04-27T19:17:02Z) - FedP3: Federated Personalized and Privacy-friendly Network Pruning under Model Heterogeneity [82.5448598805968]
We present an effective and adaptable federated framework FedP3, representing Federated Personalized and Privacy-friendly network Pruning.
We offer a theoretical interpretation of FedP3 and its locally differential-private variant, DP-FedP3, and theoretically validate their efficiencies.
arXiv Detail & Related papers (2024-04-15T14:14:05Z) - FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners.
FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks.
We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z) - An Efficient and Multi-private Key Secure Aggregation for Federated Learning [41.29971745967693]
We propose an efficient and multi-private key secure aggregation scheme for federated learning.
Specifically, we skillfully modify the variant ElGamal encryption technique to achieve homomorphic addition operation.
For the high dimensional deep model parameter, we introduce a super-increasing sequence to compress multi-dimensional data into 1-D.
arXiv Detail & Related papers (2023-06-15T09:05:36Z) - Cross-Network Social User Embedding with Hybrid Differential Privacy
Guarantees [81.6471440778355]
We propose a Cross-network Social User Embedding framework, namely DP-CroSUE, to learn the comprehensive representations of users in a privacy-preserving way.
In particular, for each heterogeneous social network, we first introduce a hybrid differential privacy notion to capture the variation of privacy expectations for heterogeneous data types.
To further enhance user embeddings, a novel cross-network GCN embedding model is designed to transfer knowledge across networks through those aligned users.
arXiv Detail & Related papers (2022-09-04T06:22:37Z) - Collusion Resistant Federated Learning with Oblivious Distributed
Differential Privacy [4.951247283741297]
Privacy-preserving federated learning enables a population of distributed clients to jointly learn a shared model.
We present an efficient mechanism based on oblivious distributed differential privacy that is the first to protect against such client collusion.
We conclude with empirical analysis of the protocol's execution speed, learning accuracy, and privacy performance on two data sets.
arXiv Detail & Related papers (2022-02-20T19:52:53Z) - Federated Unsupervised Representation Learning [56.715917111878106]
We formulate a new problem in federated learning called Federated Unsupervised Representation Learning (FURL) to learn a common representation model without supervision.
FedCA is composed of two key modules: dictionary module to aggregate the representations of samples from each client and share with all clients for consistency of representation space and alignment module to align the representation of each client on a base model trained on a public data.
arXiv Detail & Related papers (2020-10-18T13:28:30Z) - Efficient Sparse Secure Aggregation for Federated Learning [0.20052993723676896]
We adapt compression-based federated techniques to additive secret sharing, leading to an efficient secure aggregation protocol.
We prove its privacy against malicious adversaries and its correctness in the semi-honest setting.
Compared to prior works on secure aggregation, our protocol has a lower communication and adaptable costs for a similar accuracy.
arXiv Detail & Related papers (2020-07-29T14:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.