Counting Distinct Elements in the Turnstile Model with Differential Privacy under Continual Observation
- URL: http://arxiv.org/abs/2306.06723v3
- Date: Wed, 10 Jul 2024 19:18:36 GMT
- Title: Counting Distinct Elements in the Turnstile Model with Differential Privacy under Continual Observation
- Authors: Palak Jain, Iden Kalemaj, Sofya Raskhodnikova, Satchit Sivakumar, Adam Smith,
- Abstract summary: We show that every differentially private mechanism that handles insertions and deletions has worst-case additive error at least $T1/4$ even under a relatively weak, event-level privacy definition.
We present an item-level differentially private mechanism that, for all turnstile streams with maximum flippancy $w$, continually outputs the number of distinct elements with an $O(sqrtw cdot polylog T)$ additive error.
- Score: 3.9476868147424162
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Privacy is a central challenge for systems that learn from sensitive data sets, especially when a system's outputs must be continuously updated to reflect changing data. We consider the achievable error for differentially private continual release of a basic statistic - the number of distinct items - in a stream where items may be both inserted and deleted (the turnstile model). With only insertions, existing algorithms have additive error just polylogarithmic in the length of the stream $T$. We uncover a much richer landscape in the turnstile model, even without considering memory restrictions. We show that every differentially private mechanism that handles insertions and deletions has worst-case additive error at least $T^{1/4}$ even under a relatively weak, event-level privacy definition. Then, we identify a parameter of the input stream, its maximum flippancy, that is low for natural data streams and for which we give tight parameterized error guarantees. Specifically, the maximum flippancy is the largest number of times that the contribution of a single item to the distinct elements count changes over the course of the stream. We present an item-level differentially private mechanism that, for all turnstile streams with maximum flippancy $w$, continually outputs the number of distinct elements with an $O(\sqrt{w} \cdot poly\log T)$ additive error, without requiring prior knowledge of $w$. We prove that this is the best achievable error bound that depends only on $w$, for a large range of values of $w$. When $w$ is small, the error of our mechanism is similar to the polylogarithmic in $T$ error in the insertion-only setting, bypassing the hardness in the turnstile model.
Related papers
- Data subsampling for Poisson regression with pth-root-link [53.63838219437508]
We develop and analyze data subsampling techniques for Poisson regression.
In particular, we consider the Poisson generalized linear model with ID- and square root-link functions.
arXiv Detail & Related papers (2024-10-30T10:09:05Z) - Privately Counting Partially Ordered Data [50.98561191019676]
We consider differentially private counting when each data point consists of $d$ bits satisfying a partial order.
Our main technical contribution is a problem-specific $K$-norm mechanism that runs in time $O(d2)$.
arXiv Detail & Related papers (2024-10-09T13:43:35Z) - Private Counting of Distinct Elements in the Turnstile Model and Extensions [2.5200018924833327]
We show that a very simple algorithm based on the sparse vector technique achieves a tight additive error for item-level $(epsilon,delta)$-differential privacy.
Our second result is a bound which shows that for a large class of algorithms, the lower bound from item-level differential privacy extends to event-level differential privacy.
arXiv Detail & Related papers (2024-08-21T14:06:22Z) - On Differentially Private U Statistics [25.683071759227293]
We propose a new thresholding-based approach using emphlocal H'ajek projections to reweight different subsets of the data.
This leads to nearly optimal private error for non-degenerate U-statistics and a strong indication of near-optimality for degenerate U-statistics.
arXiv Detail & Related papers (2024-07-06T03:27:14Z) - Almost Instance-optimal Clipping for Summation Problems in the Shuffle Model of Differential Privacy [36.04370583654189]
We show how the clipping mechanism can achieve an instance-optimal error of $O(max_i x_i cdot loglog U /varepsilon)$.
We also extend our technique to the high-dimensional sum estimation problem and sparse vector aggregation.
arXiv Detail & Related papers (2024-03-15T09:04:00Z) - Provably Efficient High-Dimensional Bandit Learning with Batched
Feedbacks [93.00280593719513]
We study high-dimensional multi-armed contextual bandits with batched feedback where the $T$ steps of online interactions are divided into $L$ batches.
In specific, each batch collects data according to a policy that depends on previous batches and the rewards are revealed only at the end of the batch.
Our algorithm achieves regret bounds comparable to those in fully sequential setting with only $mathcalO( log T)$ batches.
arXiv Detail & Related papers (2023-11-22T06:06:54Z) - Differentially Private Clustering in Data Streams [65.78882209673885]
We present a differentially private streaming clustering framework which only requires an offline DP coreset or clustering algorithm as a blackbox.
Our framework is also differentially private under the continual release setting, i.e., the union of outputs of our algorithms at every timestamp is always differentially private.
arXiv Detail & Related papers (2023-07-14T16:11:22Z) - Differential Privacy for Clustering Under Continual Observation [5.220940151628734]
We consider the problem of clustering privately a dataset in $mathbbRd$ that undergoes both insertion and deletion of points.
We give an $varepsilon$-differentially private clustering mechanism for the $k$-means objective under continual observation.
arXiv Detail & Related papers (2023-07-07T07:23:56Z) - Concurrent Shuffle Differential Privacy Under Continual Observation [60.127179363119936]
We study the private continual summation problem (a.k.a. the counter problem)
We show that the concurrent shuffle model allows for significantly improved error compared to a standard (single) shuffle model.
arXiv Detail & Related papers (2023-01-29T20:42:54Z) - On Distributed Differential Privacy and Counting Distinct Elements [52.701425652208734]
We study the setup where each of $n$ users holds an element from a discrete set.
The goal is to count the number of distinct elements across all users.
arXiv Detail & Related papers (2020-09-21T04:13:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.