Bifrost: A Much Simpler Secure Two-Party Data Join Protocol for Secure Data Analytics
- URL: http://arxiv.org/abs/2602.01225v2
- Date: Thu, 05 Feb 2026 08:00:08 GMT
- Title: Bifrost: A Much Simpler Secure Two-Party Data Join Protocol for Secure Data Analytics
- Authors: Shuyu Chen, Mingxun Zhou, Haoyu Niu, Guopeng Lin, Weili Han,
- Abstract summary: We propose a simpler secure data join protocol, Bifrost, which outputs (the secret shares of) a redundancy-free joined table.<n> Experiments on datasets of up to 100 GB show that Bifrost achieves $2.54 sim 22.32times$ speedup and reduces the communication by $84.15% sim 88.97%$ compared to the SOTA redundancy-free secure data join protocol iPrivJoin.
- Score: 17.836320893068166
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Secure data join enables two parties with vertically distributed data to securely compute the joined table, allowing the parties to perform downstream Secure multi-party computation-based Data Analytics (SDA), such as training machine learning models, based on the joined table. While Circuit-based Private Set Intersection (CPSI) can be used for secure data join, it introduces redundant dummy rows in the joined table, which results in high overhead in the downstream SDA tasks. iPrivJoin addresses this issue but introduces significant communication overhead in the redundancy removal process, as it relies on the cryptographic primitive OPPRF for data encoding and multiple rounds of oblivious shuffles. In this paper, we propose a much simpler secure data join protocol, Bifrost, which outputs (the secret shares of) a redundancy-free joined table. The highlight of Bifrost lies in its simplicity: it builds upon two conceptually simple building blocks, an ECDH-PSI protocol and a two-party oblivious shuffle protocol. The lightweight protocol design allows Bifrost to avoid the need for OPPRF. We also proposed a simple optimization named \textit{dual mapping} that reduces the rounds of oblivious shuffle needed from two to one. Experiments on datasets of up to 100 GB show that Bifrost achieves $2.54 \sim 22.32\times$ speedup and reduces the communication by $84.15\% \sim 88.97\%$ compared to the SOTA redundancy-free secure data join protocol iPrivJoin. Notably, the communication size of Bifrost is nearly equal to the size of the input data. In the two-step SDA pipeline evaluation (secure join and SDA), the redundancy-free property of Bifrost not only avoids the catastrophic error rate blowup in the downstream tasks caused by the dummy rows in the joined table (as introduced in CPSI), but also shows up to $2.80\times$ speed-up in the SDA process with up to $73.15\%$ communication reduction.
Related papers
- Augmented Shuffle Differential Privacy Protocols for Large-Domain Categorical and Key-Value Data [6.69087470775851]
Shuffle DP protocols provide high accuracy and privacy by introducing a shuffler who randomly shuffles data in a distributed system.<n>Most shuffle DP protocols are vulnerable to two attacks: collusion attacks by the data collector and users and data poisoning attacks.<n>We introduce a novel augmented shuffle DP protocol called the FME (Filtering-with-Multiple-Encryption) protocol.
arXiv Detail & Related papers (2025-09-02T06:40:45Z) - Breaking the Layer Barrier: Remodeling Private Transformer Inference with Hybrid CKKS and MPC [16.452180247201948]
This paper presents an efficient framework for private Transformer inference that combines Homomorphic Encryption (HE) and Secure Multi-party Computation (MPC) to protect data privacy.<n>The proposed framework, dubbed BLB, overcomes this by breaking down layers into fine-grained operators and further fusing adjacent linear operators, reducing the need for HE/MPC conversions.<n>BLB achieves a $21times$ reduction in communication overhead compared to BOLT (S&P'24) and a $2times$ reduction compared to Bumblebee (NDSS'25), along with latency reductions of $13times$ and $1.8
arXiv Detail & Related papers (2025-08-27T02:40:50Z) - Fair Data Exchange at Near-Plaintext Efficiency [1.187519459637148]
We introduce an FDE implementation that achieves near-plaintext speeds and sizes, making fair exchange practical even for gigabyte-scale files.<n>This can reduce transaction fees from roughly $10 to under $0.01 and shorten transaction latency from tens of seconds on down to about a second or less.
arXiv Detail & Related papers (2025-06-17T19:50:25Z) - IDCloak: A Practical Secure Multi-party Dataset Join Framework for Vertical Privacy-preserving Machine Learning [7.765213701941396]
This paper proposes IDCloak, the first practical secure multi-party dataset join framework for vPPML.<n>We show that IDCloak demonstrates higher efficiency in the two-party setting and comparable performance when the party number increases.<n>Our proposed cmPSI protocol provides a stronger security guarantee (dishonest majority) while improving efficiency by up to $7.78times$ in time and $8.73times$ in communication sizes.
arXiv Detail & Related papers (2025-06-01T16:20:39Z) - Augmented Shuffle Protocols for Accurate and Robust Frequency Estimation under Differential Privacy [4.527947047128005]
We propose three concrete protocols providing DP and robustness against the two attacks.<n>Our first protocol generates the number of dummy values for each item from a binomial distribution.<n>Our second protocol significantly improves the utility of our first protocol by introducing a novel dummy-count distribution.
arXiv Detail & Related papers (2025-04-10T01:06:05Z) - BiMaCoSR: Binary One-Step Diffusion Model Leveraging Flexible Matrix Compression for Real Super-Resolution [63.777210548110425]
We propose BiMaCoSR, which combines binarization and one-step distillation to obtain extreme compression and acceleration.<n>BiMaCoSR achieves a 23.8x compression ratio and a 27.4x speedup ratio compared to FP counterpart.
arXiv Detail & Related papers (2025-02-01T06:34:55Z) - TablePuppet: A Generic Framework for Relational Federated Learning [27.274856376963356]
Current federated learning (FL) approaches view decentralized training data as a single table, divided among participants either horizontally (by rows) or vertically (by columns)
This scenario requires intricate operations like joins and unions to obtain the training data, which is either costly or restricted by privacy concerns.
We propose TablePuppet, a generic framework for RFL that decomposes the learning process into two steps: (1) learning over join (LoJ) followed by (2) learning over union (LoU)
arXiv Detail & Related papers (2024-03-23T13:28:37Z) - HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM
Inference [68.59839755875252]
HiRE comprises of two novel components: (i) a compression scheme to cheaply predict top-$k$ rows/columns with high recall, followed by full computation restricted to the predicted subset, and (ii) DA-TOP-$k$: an efficient multi-device approximate top-$k$ operator.
We demonstrate that on a one billion parameter model, HiRE applied to both the softmax as well as feedforward layers, achieves almost matching pretraining and downstream accuracy, and speeds up inference latency by $1.47times$ on a single TPUv5e device.
arXiv Detail & Related papers (2024-02-14T18:04:36Z) - Estimating the Decoding Failure Rate of Binary Regular Codes Using Iterative Decoding [84.0257274213152]
We propose a new technique to provide accurate estimates of the DFR of a two-iterations (parallel) bit flipping decoder.<n>We validate our results, providing comparisons of the modeled and simulated weight of the syndrome, incorrectly-guessed error bit distribution at the end of the first iteration, and two-itcrypteration Decoding Failure Rates (DFR)
arXiv Detail & Related papers (2024-01-30T11:40:24Z) - ByzSecAgg: A Byzantine-Resistant Secure Aggregation Scheme for Federated Learning Based on Coded Computing and Vector Commitment [61.540831911168226]
ByzSecAgg is an efficient secure aggregation scheme for federated learning.<n>ByzSecAgg is resistant to Byzantine attacks and privacy leakages.
arXiv Detail & Related papers (2023-02-20T11:15:18Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - CodedPaddedFL and CodedSecAgg: Straggler Mitigation and Secure
Aggregation in Federated Learning [86.98177890676077]
We present two novel coded federated learning (FL) schemes for linear regression that mitigate the effect of straggling devices.
The first scheme, CodedPaddedFL, mitigates the effect of straggling devices while retaining the privacy level of conventional FL.
The second scheme, CodedSecAgg, provides straggler resiliency and robustness against model inversion attacks.
arXiv Detail & Related papers (2021-12-16T14:26:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.