Related papers: Counting Distinct Elements in the Turnstile Model with Differential Privacy under Continual Observation

Counting Distinct Elements in the Turnstile Model with Differential Privacy under Continual Observation

URL: http://arxiv.org/abs/2306.06723v3
Date: Wed, 10 Jul 2024 19:18:36 GMT
Title: Counting Distinct Elements in the Turnstile Model with Differential Privacy under Continual Observation
Authors: Palak Jain, Iden Kalemaj, Sofya Raskhodnikova, Satchit Sivakumar, Adam Smith,
Abstract summary: We show that every differentially private mechanism that handles insertions and deletions has worst-case additive error at least $T1/4$ even under a relatively weak, event-level privacy definition. We present an item-level differentially private mechanism that, for all turnstile streams with maximum flippancy $w$, continually outputs the number of distinct elements with an $O(sqrtw cdot polylog T)$ additive error.
Score: 3.9476868147424162
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Privacy is a central challenge for systems that learn from sensitive data sets, especially when a system's outputs must be continuously updated to reflect changing data. We consider the achievable error for differentially private continual release of a basic statistic - the number of distinct items - in a stream where items may be both inserted and deleted (the turnstile model). With only insertions, existing algorithms have additive error just polylogarithmic in the length of the stream $T$. We uncover a much richer landscape in the turnstile model, even without considering memory restrictions. We show that every differentially private mechanism that handles insertions and deletions has worst-case additive error at least $T^{1/4}$ even under a relatively weak, event-level privacy definition. Then, we identify a parameter of the input stream, its maximum flippancy, that is low for natural data streams and for which we give tight parameterized error guarantees. Specifically, the maximum flippancy is the largest number of times that the contribution of a single item to the distinct elements count changes over the course of the stream. We present an item-level differentially private mechanism that, for all turnstile streams with maximum flippancy $w$, continually outputs the number of distinct elements with an $O(\sqrt{w} \cdot poly\log T)$ additive error, without requiring prior knowledge of $w$. We prove that this is the best achievable error bound that depends only on $w$, for a large range of values of $w$. When $w$ is small, the error of our mechanism is similar to the polylogarithmic in $T$ error in the insertion-only setting, bypassing the hardness in the turnstile model.

Related papers

Differentially Private Space-Efficient Algorithms for Counting Distinct Elements in the Turnstile Model [61.40326886123332]
We give the first sublinear space differentially private algorithms for the fundamental problem of counting distinct elements in the turnstile streaming model.<n>Our result significantly improves upon the space requirements of the state-of-the-art algorithms for this problem, which is linear.<n>When a bound $W$ on the number of times an item appears in the stream is known, our algorithm provides $tildeO_eta(sqrtW)$ additive error using $tildeO_eta(sqrtW)$ space.
arXiv Detail & Related papers (2025-05-29T17:21:20Z)
Rate-reliability functions for deterministic identification [49.126395046088014]
We find that for positive exponents linear scaling is restored, now with a rate that is a function of the reliability exponents. We extend our results to classical-quantum channels and quantum channels with product input restriction.
arXiv Detail & Related papers (2025-02-04T15:09:14Z)
Data subsampling for Poisson regression with pth-root-link [53.63838219437508]
We develop and analyze data subsampling techniques for Poisson regression. In particular, we consider the Poisson generalized linear model with ID- and square root-link functions.
arXiv Detail & Related papers (2024-10-30T10:09:05Z)
Privately Counting Partially Ordered Data [50.98561191019676]
We consider differentially private counting when each data point consists of $d$ bits satisfying a partial order. Our main technical contribution is a problem-specific $K$-norm mechanism that runs in time $O(d2)$.
arXiv Detail & Related papers (2024-10-09T13:43:35Z)
Private Counting of Distinct Elements in the Turnstile Model and Extensions [2.5200018924833327]
We show that a very simple algorithm based on the sparse vector technique achieves a tight additive error for item-level $(epsilon,delta)$-differential privacy. Our second result is a bound which shows that for a large class of algorithms, the lower bound from item-level differential privacy extends to event-level differential privacy.
arXiv Detail & Related papers (2024-08-21T14:06:22Z)
On Differentially Private U Statistics [25.683071759227293]
We propose a new thresholding-based approach using emphlocal H'ajek projections to reweight different subsets of the data. This leads to nearly optimal private error for non-degenerate U-statistics and a strong indication of near-optimality for degenerate U-statistics.
arXiv Detail & Related papers (2024-07-06T03:27:14Z)
Almost Instance-optimal Clipping for Summation Problems in the Shuffle Model of Differential Privacy [36.04370583654189]
We show how the clipping mechanism can achieve an instance-optimal error of $O(max_i x_i cdot loglog U /varepsilon)$. We also extend our technique to the high-dimensional sum estimation problem and sparse vector aggregation.
arXiv Detail & Related papers (2024-03-15T09:04:00Z)
Provably Efficient High-Dimensional Bandit Learning with Batched Feedbacks [93.00280593719513]
We study high-dimensional multi-armed contextual bandits with batched feedback where the $T$ steps of online interactions are divided into $L$ batches. In specific, each batch collects data according to a policy that depends on previous batches and the rewards are revealed only at the end of the batch. Our algorithm achieves regret bounds comparable to those in fully sequential setting with only $mathcalO( log T)$ batches.
arXiv Detail & Related papers (2023-11-22T06:06:54Z)
Differentially Private Clustering in Data Streams [65.78882209673885]
We present a differentially private streaming clustering framework which only requires an offline DP coreset or clustering algorithm as a blackbox. Our framework is also differentially private under the continual release setting, i.e., the union of outputs of our algorithms at every timestamp is always differentially private.
arXiv Detail & Related papers (2023-07-14T16:11:22Z)
Differential Privacy for Clustering Under Continual Observation [5.220940151628734]
We consider the problem of clustering privately a dataset in $mathbbRd$ that undergoes both insertion and deletion of points. We give an $varepsilon$-differentially private clustering mechanism for the $k$-means objective under continual observation.
arXiv Detail & Related papers (2023-07-07T07:23:56Z)
Concurrent Shuffle Differential Privacy Under Continual Observation [60.127179363119936]
We study the private continual summation problem (a.k.a. the counter problem) We show that the concurrent shuffle model allows for significantly improved error compared to a standard (single) shuffle model.
arXiv Detail & Related papers (2023-01-29T20:42:54Z)
On Distributed Differential Privacy and Counting Distinct Elements [52.701425652208734]
We study the setup where each of $n$ users holds an element from a discrete set. The goal is to count the number of distinct elements across all users.
arXiv Detail & Related papers (2020-09-21T04:13:34Z)
Differentially Private Set Union [23.001440922454407]
We study the basic operation of set union in the global model of differential privacy. In this problem, we are given a universe $U$ of items, possibly of infinite size, and a database $D$ of users. We want an ($epsilon$,$delta$)-differentially private algorithm which outputs a subset $S subset cup_i W_i$ such that the size of $S$ is as large as possible.
arXiv Detail & Related papers (2020-02-22T18:33:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.