Bayesian Advantage of Re-Identification Attack in the Shuffle Model
- URL: http://arxiv.org/abs/2511.03213v1
- Date: Wed, 05 Nov 2025 06:03:56 GMT
- Title: Bayesian Advantage of Re-Identification Attack in the Shuffle Model
- Authors: Pengcheng Su, Haibo Cheng, Ping Wang,
- Abstract summary: The shuffle model, which anonymizes data by randomly permuting messages, has been widely adopted in cryptography and differential privacy.<n>We present the first systematic study of the Bayesian advantage in re-identifying a user's message under the shuffle model.
- Score: 7.289625261239043
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The shuffle model, which anonymizes data by randomly permuting user messages, has been widely adopted in both cryptography and differential privacy. In this work, we present the first systematic study of the Bayesian advantage in re-identifying a user's message under the shuffle model. We begin with a basic setting: one sample is drawn from a distribution $P$, and $n - 1$ samples are drawn from a distribution $Q$, after which all $n$ samples are randomly shuffled. We define $\beta_n(P, Q)$ as the success probability of a Bayes-optimal adversary in identifying the sample from $P$, and define the additive and multiplicative Bayesian advantages as $\mathsf{Adv}_n^{+}(P, Q) = \beta_n(P,Q) - \frac{1}{n}$ and $\mathsf{Adv}_n^{\times}(P, Q) = n \cdot \beta_n(P,Q)$, respectively. We derive exact analytical expressions and asymptotic characterizations of $\beta_n(P, Q)$, along with evaluations in several representative scenarios. Furthermore, we establish (nearly) tight mutual bounds between the additive Bayesian advantage and the total variation distance. Finally, we extend our analysis beyond the basic setting and present, for the first time, an upper bound on the success probability of Bayesian attacks in shuffle differential privacy. Specifically, when the outputs of $n$ users--each processed through an $\varepsilon$-differentially private local randomizer--are shuffled, the probability that an attacker successfully re-identifies any target user's message is at most $e^{\varepsilon}/n$.
Related papers
- Mutual Information Bounds in the Shuffle Model [7.289625261239043]
This paper presents the first systematic study of the single-message shuffle model from an information-theoretic perspective.<n>We analyze two regimes: the shuffle-only setting and the shuffle-DP setting.<n>For the shuffle-DP setting, we establish information-theoretic upper bounds on total information leakage.
arXiv Detail & Related papers (2025-11-19T02:44:59Z) - Query-Efficient Locally Private Hypothesis Selection via the Scheffe Graph [45.975805184376036]
We describe an algorithm that satisfies local differential privacy, performs $tildeO(k3/2)$ non-adaptive queries to individuals.<n>We also introduce a new object we dub the Scheff'e graph, which captures structure of the differences between distributions in $Q$.
arXiv Detail & Related papers (2025-09-19T17:41:15Z) - Near-optimal algorithms for private estimation and sequential testing of collision probability [1.62060928868899]
We describe an algorithm that satisfies $(alpha, beta)$-local differential privacy and estimates collision probability with error at most $epsilon$.<n>We also present a sequential testing algorithm for collision probability, which can distinguish between collision probability values that are separated by $epsilon$.
arXiv Detail & Related papers (2025-04-18T17:12:15Z) - Almost Minimax Optimal Best Arm Identification in Piecewise Stationary Linear Bandits [55.957560311008926]
We propose a piecewise stationary linear bandit (PSLB) model where the quality of an arm is measured by its return averaged over all contexts.
PS$varepsilon$BAI$+$ is guaranteed to identify an $varepsilon$-optimal arm with probability $ge 1-delta$ and with a minimal number of samples.
arXiv Detail & Related papers (2024-10-10T06:15:42Z) - Discrete Distribution Estimation under User-level Local Differential
Privacy [37.65849910114053]
We study discrete distribution estimation under user-level local differential privacy (LDP)
In user-level $varepsilon$-LDP, each user has $mge1$ samples and the privacy of all $m$ samples must be preserved simultaneously.
arXiv Detail & Related papers (2022-11-07T18:29:32Z) - Bayesian Learning via Q-Exponential Process [10.551294837978363]
Regularization is one of the most fundamental topics in optimization, statistics and machine learning.
In this work, we generalize the $q$-exponential distribution (with density proportional to) $exp( frac12|u|q)$ to a process named $Q$-exponential (Q-EP) process that corresponds to the $L_q$ regularization of functions.
arXiv Detail & Related papers (2022-10-14T17:37:14Z) - Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle.
We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z) - User-Level Private Learning via Correlated Sampling [49.453751858361265]
We consider the setting where each user holds $m$ samples and the privacy protection is enforced at the level of each user's data.
We show that, in this setting, we may learn with a much fewer number of users.
arXiv Detail & Related papers (2021-10-21T15:33:53Z) - Optimal Testing of Discrete Distributions with High Probability [49.19942805582874]
We study the problem of testing discrete distributions with a focus on the high probability regime.
We provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors.
arXiv Detail & Related papers (2020-09-14T16:09:17Z) - Locally Private Hypothesis Selection [96.06118559817057]
We output a distribution from $mathcalQ$ whose total variation distance to $p$ is comparable to the best such distribution.
We show that the constraint of local differential privacy incurs an exponential increase in cost.
Our algorithms result in exponential improvements on the round complexity of previous methods.
arXiv Detail & Related papers (2020-02-21T18:30:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.