IDCloak: A Practical Secure Multi-party Dataset Join Framework for Vertical Privacy-preserving Machine Learning
- URL: http://arxiv.org/abs/2506.01072v1
- Date: Sun, 01 Jun 2025 16:20:39 GMT
- Title: IDCloak: A Practical Secure Multi-party Dataset Join Framework for Vertical Privacy-preserving Machine Learning
- Authors: Shuyu Chen, Guopeng Lin, Haoyu Niu, Lushan Song, Chengxun Hong, Weili Han,
- Abstract summary: This paper proposes IDCloak, the first practical secure multi-party dataset join framework for vPPML.<n>We show that IDCloak demonstrates higher efficiency in the two-party setting and comparable performance when the party number increases.<n>Our proposed cmPSI protocol provides a stronger security guarantee (dishonest majority) while improving efficiency by up to $7.78times$ in time and $8.73times$ in communication sizes.
- Score: 7.765213701941396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vertical privacy-preserving machine learning (vPPML) enables multiple parties to train models on their vertically distributed datasets while keeping datasets private. In vPPML, it is critical to perform the secure dataset join, which aligns features corresponding to intersection IDs across datasets and forms a secret-shared and joint training dataset. However, existing methods for this step could be impractical due to: (1) they are insecure when they expose intersection IDs; or (2) they rely on a strong trust assumption requiring a non-colluding auxiliary server; or (3) they are limited to the two-party setting. This paper proposes IDCloak, the first practical secure multi-party dataset join framework for vPPML that keeps IDs private without a non-colluding auxiliary server. IDCloak consists of two protocols: (1) a circuit-based multi-party private set intersection protocol (cmPSI), which obtains secret-shared flags indicating intersection IDs via an optimized communication structure combining OKVS and OPRF; (2) a secure multi-party feature alignment protocol, which obtains the secret-shared and joint dataset using secret-shared flags, via our proposed efficient secure shuffle protocol. Experiments show that: (1) compared to the state-of-the-art secure two-party dataset join framework (iPrivjoin), IDCloak demonstrates higher efficiency in the two-party setting and comparable performance when the party number increases; (2) compared to the state-of-the-art cmPSI protocol under honest majority, our proposed cmPSI protocol provides a stronger security guarantee (dishonest majority) while improving efficiency by up to $7.78\times$ in time and $8.73\times$ in communication sizes; (3) our proposed secure shuffle protocol outperforms the state-of-the-art shuffle protocol by up to $138.34\times$ in time and $132.13\times$ in communication sizes.
Related papers
- Information-Theoretic Decentralized Secure Aggregation with Collusion Resilience [98.31540557973179]
We study the problem of decentralized secure aggregation (DSA) from an information-theoretic perspective.<n>We characterize the optimal rate region, which specifies the minimum achievable communication and secret key rates for DSA.<n>Our results establish the fundamental performance limits of DSA, providing insights for the design of provably secure and communication-efficient protocols.
arXiv Detail & Related papers (2025-08-01T12:51:37Z) - Multi-Party Private Set Intersection: A Circuit-Based Protocol with Jaccard Similarity for Secure and Efficient Anomaly Detection in Network Traffic [10.775721991076793]
We present a new circuit-based protocol for multi-party private set intersection (PSI)
With 7 parties, each possessing a set size of 212, our protocol completes in just 19 seconds.
arXiv Detail & Related papers (2024-01-23T07:59:04Z) - HasTEE+ : Confidential Cloud Computing and Analytics with Haskell [50.994023665559496]
Confidential computing enables the protection of confidential code and data in a co-tenanted cloud deployment using specialized hardware isolation units called Trusted Execution Environments (TEEs)
TEEs offer low-level C/C++-based toolchains that are susceptible to inherent memory safety vulnerabilities and lack language constructs to monitor explicit and implicit information-flow leaks.
We address the above with HasTEE+, a domain-specific language (cla) embedded in Haskell that enables programming TEEs in a high-level language with strong type-safety.
arXiv Detail & Related papers (2024-01-17T00:56:23Z) - Advancement on Security Applications of Private Intersection Sum Protocol [1.0485739694839666]
Secure computation protocols combine inputs from involved parties to generate an output while keeping their inputs private.
Private Set Intersection (PSI) is a secure computation protocol that allows two parties to learn the intersection of their sets without revealing anything else.
Private Intersection Sum (PIS) extends PSI when the two parties want to learn the cardinality of the intersection.
Private Join and Compute (PJC) is a scalable extension of PIS protocol to help organizations work together with confidential data sets.
arXiv Detail & Related papers (2023-08-28T17:42:53Z) - An Efficient and Multi-private Key Secure Aggregation for Federated Learning [41.29971745967693]
We propose an efficient and multi-private key secure aggregation scheme for federated learning.
Specifically, we skillfully modify the variant ElGamal encryption technique to achieve homomorphic addition operation.
For the high dimensional deep model parameter, we introduce a super-increasing sequence to compress multi-dimensional data into 1-D.
arXiv Detail & Related papers (2023-06-15T09:05:36Z) - ByzSecAgg: A Byzantine-Resistant Secure Aggregation Scheme for Federated
Learning Based on Coded Computing and Vector Commitment [90.60126724503662]
ByzSecAgg is an efficient secure aggregation scheme for federated learning.
ByzSecAgg is protected against Byzantine attacks and privacy leakages.
arXiv Detail & Related papers (2023-02-20T11:15:18Z) - Differentially Private Federated Clustering over Non-IID Data [59.611244450530315]
clustering clusters (FedC) problem aims to accurately partition unlabeled data samples distributed over massive clients into finite clients under the orchestration of a server.
We propose a novel FedC algorithm using differential privacy convergence technique, referred to as DP-Fed, in which partial participation and multiple clients are also considered.
Various attributes of the proposed DP-Fed are obtained through theoretical analyses of privacy protection, especially for the case of non-identically and independently distributed (non-i.i.d.) data.
arXiv Detail & Related papers (2023-01-03T05:38:43Z) - Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive
Privacy Analysis and Beyond [57.10914865054868]
We consider vertical logistic regression (VLR) trained with mini-batch descent gradient.
We provide a comprehensive and rigorous privacy analysis of VLR in a class of open-source Federated Learning frameworks.
arXiv Detail & Related papers (2022-07-19T05:47:30Z) - Byzantine-Robust Federated Learning with Optimal Statistical Rates and
Privacy Guarantees [123.0401978870009]
We propose Byzantine-robust federated learning protocols with nearly optimal statistical rates.
We benchmark against competing protocols and show the empirical superiority of the proposed protocols.
Our protocols with bucketing can be naturally combined with privacy-guaranteeing procedures to introduce security against a semi-honest server.
arXiv Detail & Related papers (2022-05-24T04:03:07Z) - Beyond the Prototype: Divide-and-conquer Proxies for Few-shot
Segmentation [63.910211095033596]
Few-shot segmentation aims to segment unseen-class objects given only a handful of densely labeled samples.
We propose a simple yet versatile framework in the spirit of divide-and-conquer.
Our proposed approach, named divide-and-conquer proxies (DCP), allows for the development of appropriate and reliable information.
arXiv Detail & Related papers (2022-04-21T06:21:14Z) - Scotch: An Efficient Secure Computation Framework for Secure Aggregation [0.0]
Federated learning enables multiple data owners to jointly train a machine learning model without revealing their private datasets.
A malicious aggregation server might use the model parameters to derive sensitive information about the training dataset used.
We propose textscScotch, a decentralized textitm-party secure-computation framework for federated aggregation.
arXiv Detail & Related papers (2022-01-19T17:16:35Z) - Efficient Sparse Secure Aggregation for Federated Learning [0.20052993723676896]
We adapt compression-based federated techniques to additive secret sharing, leading to an efficient secure aggregation protocol.
We prove its privacy against malicious adversaries and its correctness in the semi-honest setting.
Compared to prior works on secure aggregation, our protocol has a lower communication and adaptable costs for a similar accuracy.
arXiv Detail & Related papers (2020-07-29T14:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.