Augmented Shuffle Differential Privacy Protocols for Large-Domain Categorical and Key-Value Data
- URL: http://arxiv.org/abs/2509.02004v1
- Date: Tue, 02 Sep 2025 06:40:45 GMT
- Title: Augmented Shuffle Differential Privacy Protocols for Large-Domain Categorical and Key-Value Data
- Authors: Takao Murakami, Yuichi Sei, Reo Eriguchi,
- Abstract summary: Shuffle DP protocols provide high accuracy and privacy by introducing a shuffler who randomly shuffles data in a distributed system.<n>Most shuffle DP protocols are vulnerable to two attacks: collusion attacks by the data collector and users and data poisoning attacks.<n>We introduce a novel augmented shuffle DP protocol called the FME (Filtering-with-Multiple-Encryption) protocol.
- Score: 6.69087470775851
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Shuffle DP (Differential Privacy) protocols provide high accuracy and privacy by introducing a shuffler who randomly shuffles data in a distributed system. However, most shuffle DP protocols are vulnerable to two attacks: collusion attacks by the data collector and users and data poisoning attacks. A recent study addresses this issue by introducing an augmented shuffle DP protocol, where users do not add noise and the shuffler performs random sampling and dummy data addition. However, it focuses on frequency estimation over categorical data with a small domain and cannot be applied to a large domain due to prohibitively high communication and computational costs. In this paper, we fill this gap by introducing a novel augmented shuffle DP protocol called the FME (Filtering-with-Multiple-Encryption) protocol. Our FME protocol uses a hash function to filter out unpopular items and then accurately calculates frequencies for popular items. To perform this within one round of interaction between users and the shuffler, our protocol carefully communicates within a system using multiple encryption. We also apply our FME protocol to more advanced KV (Key-Value) statistics estimation with an additional technique to reduce bias. For both categorical and KV data, we prove that our protocol provides computational DP, high robustness to the above two attacks, accuracy, and efficiency. We show the effectiveness of our proposals through comparisons with twelve existing protocols.
Related papers
- Robust Single-message Shuffle Differential Privacy Protocol for Accurate Distribution Estimation [29.22457447003792]
We study the distribution estimation under pure shuffle model, which is a prevalent shuffle-DP framework without strong security assumptions.<n>We propose a novel single-message textitadaptive shuffler-based piecewise (ASP) protocol with high utility and robustness.
arXiv Detail & Related papers (2026-03-05T11:40:26Z) - Perfectly-Private Analog Secure Aggregation in Federated Learning [51.61616734974475]
In federated learning, multiple parties train models locally and share their parameters with a central server, which aggregates them to update a global model.<n>In this paper, a novel secure parameter aggregation method is proposed that employs the torus rather than a finite field.
arXiv Detail & Related papers (2025-09-10T15:22:40Z) - Benchmarking Fraud Detectors on Private Graph Data [70.4654745317714]
Currently, many types of fraud are managed in part by automated detection algorithms that operate over graphs.<n>We consider the scenario where a data holder wishes to outsource development of fraud detectors to third parties.<n>Third parties submit their fraud detectors to the data holder, who evaluates these algorithms on a private dataset and then publicly communicates the results.<n>We propose a realistic privacy attack on this system that allows an adversary to de-anonymize individuals' data based only on the evaluation results.
arXiv Detail & Related papers (2025-07-30T03:20:15Z) - Augmented Shuffle Protocols for Accurate and Robust Frequency Estimation under Differential Privacy [4.527947047128005]
We propose three concrete protocols providing DP and robustness against the two attacks.<n>Our first protocol generates the number of dummy values for each item from a binomial distribution.<n>Our second protocol significantly improves the utility of our first protocol by introducing a novel dummy-count distribution.
arXiv Detail & Related papers (2025-04-10T01:06:05Z) - When Focus Enhances Utility: Target Range LDP Frequency Estimation and Unknown Item Discovery [7.746385592375338]
Local Differential Privacy protocols have been successfully deployed in real-world scenarios by tech companies like Google, Apple, and Microsoft.<n>We propose a Generalized Count Mean Sketch protocol that captures many existing frequency estimation protocols.<n>We present a novel protocol for collecting data within unknown domain, as our frequency estimation protocols only work effectively with known data domain.
arXiv Detail & Related papers (2024-12-23T05:50:11Z) - Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - DMM: Distributed Matrix Mechanism for Differentially-Private Federated Learning Based on Constant-Overhead Linear Secret Resharing [51.336015600778396]
We introduce the distributed matrix mechanism to achieve the best-of-both-worlds; better privacy of distributed DP and better utility from the matrix mechanism.<n>We accomplish this using a novel cryptographic protocol that securely transfers sensitive values across client committees of different training iterations with constant communication overhead.
arXiv Detail & Related papers (2024-10-21T16:25:14Z) - Benchmarking Secure Sampling Protocols for Differential Privacy [3.0325535716232404]
Two well-known models of Differential Privacy (DP) are the central model and the local model.
Recently, many studies have proposed to achieve DP with Secure Multi-party Computation (MPC) in distributed settings.
arXiv Detail & Related papers (2024-09-16T19:04:47Z) - Noise Variance Optimization in Differential Privacy: A Game-Theoretic Approach Through Per-Instance Differential Privacy [7.264378254137811]
Differential privacy (DP) can measure privacy loss by observing the changes in the distribution caused by the inclusion of individuals in the target dataset.
DP has been prominent in safeguarding datasets in machine learning in industry giants like Apple and Google.
We propose per-instance DP (pDP) as a constraint, measuring privacy loss for each data instance and optimizing noise tailored to individual instances.
arXiv Detail & Related papers (2024-04-24T06:51:16Z) - Data post-processing for the one-way heterodyne protocol under
composable finite-size security [62.997667081978825]
We study the performance of a practical continuous-variable (CV) quantum key distribution protocol.
We focus on the Gaussian-modulated coherent-state protocol with heterodyne detection in a high signal-to-noise ratio regime.
This allows us to study the performance for practical implementations of the protocol and optimize the parameters connected to the steps above.
arXiv Detail & Related papers (2022-05-20T12:37:09Z) - Composably secure data processing for Gaussian-modulated continuous
variable quantum key distribution [58.720142291102135]
Continuous-variable quantum key distribution (QKD) employs the quadratures of a bosonic mode to establish a secret key between two remote parties.
We consider a protocol with homodyne detection in the general setting of composable finite-size security.
In particular, we analyze the high signal-to-noise regime which requires the use of high-rate (non-binary) low-density parity check codes.
arXiv Detail & Related papers (2021-03-30T18:02:55Z) - On the Practicality of Differential Privacy in Federated Learning by
Tuning Iteration Times [51.61278695776151]
Federated Learning (FL) is well known for its privacy protection when training machine learning models among distributed clients collaboratively.
Recent studies have pointed out that the naive FL is susceptible to gradient leakage attacks.
Differential Privacy (DP) emerges as a promising countermeasure to defend against gradient leakage attacks.
arXiv Detail & Related papers (2021-01-11T19:43:12Z) - A One-Pass Private Sketch for Most Machine Learning Tasks [48.17461258268463]
Differential privacy (DP) is a compelling privacy definition that explains the privacy-utility tradeoff via formal, provable guarantees.
We propose a private sketch that supports a multitude of machine learning tasks including regression, classification, density estimation, and more.
Our sketch consists of randomized contingency tables that are indexed with locality-sensitive hashing and constructed with an efficient one-pass algorithm.
arXiv Detail & Related papers (2020-06-16T17:47:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.