Mutual Information Bounds in the Shuffle Model
- URL: http://arxiv.org/abs/2511.15051v1
- Date: Wed, 19 Nov 2025 02:44:59 GMT
- Title: Mutual Information Bounds in the Shuffle Model
- Authors: Pengcheng Su, Haibo Cheng, Ping Wang,
- Abstract summary: This paper presents the first systematic study of the single-message shuffle model from an information-theoretic perspective.<n>We analyze two regimes: the shuffle-only setting and the shuffle-DP setting.<n>For the shuffle-DP setting, we establish information-theoretic upper bounds on total information leakage.
- Score: 7.289625261239043
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The shuffle model enhances privacy by anonymizing users' reports through random permutation. This paper presents the first systematic study of the single-message shuffle model from an information-theoretic perspective. We analyze two regimes: the shuffle-only setting, where each user directly submits its message ($Y_i=X_i$), and the shuffle-DP setting, where each user first applies a local $\varepsilon_0$-differentially private mechanism before shuffling ($Y_i=\mathcal{R}(X_i)$). Let $\boldsymbol{Z} = (Y_{σ(i)})_i$ denote the shuffled sequence produced by a uniformly random permutation $σ$, and let $K = σ^{-1}(1)$ represent the position of user 1's message after shuffling. For the shuffle-only setting, we focus on a tractable yet expressive \emph{basic configuration}, where the target user's message follows $Y_1 \sim P$ and the remaining users' messages are i.i.d.\ samples from $Q$, i.e., $Y_2,\dots,Y_n \sim Q$. We derive asymptotic expressions for the mutual information quantities $I(Y_1;\boldsymbol{Z})$ and $I(K;\boldsymbol{Z})$ as $n \to \infty$, and demonstrate how this analytical framework naturally extends to settings with heterogeneous user distributions. For the shuffle-DP setting, we establish information-theoretic upper bounds on total information leakage. When each user applies an $\varepsilon_0$-DP mechanism, the overall leakage satisfies $I(K; \boldsymbol{Z}) \le 2\varepsilon_0$ and $I(X_1; \boldsymbol{Z}\mid (X_i)_{i=2}^n) \le (e^{\varepsilon_0}-1)/(2n) + O(n^{-3/2})$. These results bridge shuffle differential privacy and mutual-information-based privacy.
Related papers
- Bayesian Advantage of Re-Identification Attack in the Shuffle Model [7.289625261239043]
The shuffle model, which anonymizes data by randomly permuting messages, has been widely adopted in cryptography and differential privacy.<n>We present the first systematic study of the Bayesian advantage in re-identifying a user's message under the shuffle model.
arXiv Detail & Related papers (2025-11-05T06:03:56Z) - Profile Reconstruction from Private Sketches [13.929335175122265]
Given a multiset of $n$ items from $mathcalD$, the emphprofile reconstruction problem is to estimate, for $t = 0, 1, dots, n$, the fraction $vecf[t]$ of items in $mathcalD$ that appear exactly $tfty times.
We consider differentially private profile estimation in a distributed, space-constrained setting where we wish to maintain an updatable, private sketch of the multiset.
We show how to speed up their LP-based technique
arXiv Detail & Related papers (2024-06-03T09:51:28Z) - Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages [63.366380571397]
We study the problem of private vector mean estimation in the shuffle model of privacy where $n$ users each have a unit vector $v(i) inmathbbRd$.
We propose a new multi-message protocol that achieves the optimal error using $tildemathcalOleft(min(nvarepsilon2,d)right)$ messages per user.
arXiv Detail & Related papers (2024-04-16T00:56:36Z) - Distribution-Independent Regression for Generalized Linear Models with
Oblivious Corruptions [49.69852011882769]
We show the first algorithms for the problem of regression for generalized linear models (GLMs) in the presence of additive oblivious noise.
We present an algorithm that tackles newthis problem in its most general distribution-independent setting.
This is the first newalgorithmic result for GLM regression newwith oblivious noise which can handle more than half the samples being arbitrarily corrupted.
arXiv Detail & Related papers (2023-09-20T21:41:59Z) - Concurrent Shuffle Differential Privacy Under Continual Observation [60.127179363119936]
We study the private continual summation problem (a.k.a. the counter problem)
We show that the concurrent shuffle model allows for significantly improved error compared to a standard (single) shuffle model.
arXiv Detail & Related papers (2023-01-29T20:42:54Z) - User-Level Private Learning via Correlated Sampling [49.453751858361265]
We consider the setting where each user holds $m$ samples and the privacy protection is enforced at the level of each user's data.
We show that, in this setting, we may learn with a much fewer number of users.
arXiv Detail & Related papers (2021-10-21T15:33:53Z) - Locally differentially private estimation of nonlinear functionals of
discrete distributions [9.028773906859541]
We study the problem of estimating non-linear functionals of discrete distributions in the context of local differential privacy.
Only $alpha$-locally differentially private (LDP) samples are publicly available, where the term 'local' means that each $z_i$ is produced using one individual $x_i$.
We describe the behavior of the quadratic risk for estimating the power sum functional $F_gamma = sum_k=1K p_kgamma$, $gamma >0$ as a function of $K,, n
arXiv Detail & Related papers (2021-07-08T16:11:10Z) - On Distributed Differential Privacy and Counting Distinct Elements [52.701425652208734]
We study the setup where each of $n$ users holds an element from a discrete set.
The goal is to count the number of distinct elements across all users.
arXiv Detail & Related papers (2020-09-21T04:13:34Z) - The Sparse Hausdorff Moment Problem, with Application to Topic Models [5.151973524974052]
We give an algorithm for identifying a $k$-mixture using samples of $m=2k$ iid binary random variables.
It suffices to know the moments to additive accuracy $w_mincdotzetaO(k)$.
arXiv Detail & Related papers (2020-07-16T04:23:57Z) - Locally Private Hypothesis Selection [96.06118559817057]
We output a distribution from $mathcalQ$ whose total variation distance to $p$ is comparable to the best such distribution.
We show that the constraint of local differential privacy incurs an exponential increase in cost.
Our algorithms result in exponential improvements on the round complexity of previous methods.
arXiv Detail & Related papers (2020-02-21T18:30:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.