Differentially Private Space-Efficient Algorithms for Counting Distinct Elements in the Turnstile Model
- URL: http://arxiv.org/abs/2505.23682v1
- Date: Thu, 29 May 2025 17:21:20 GMT
- Title: Differentially Private Space-Efficient Algorithms for Counting Distinct Elements in the Turnstile Model
- Authors: Rachel Cummings, Alessandro Epasto, Jieming Mao, Tamalika Mukherjee, Tingting Ou, Peilin Zhong,
- Abstract summary: We give the first sublinear space differentially private algorithms for the fundamental problem of counting distinct elements in the turnstile streaming model.<n>Our result significantly improves upon the space requirements of the state-of-the-art algorithms for this problem, which is linear.<n>When a bound $W$ on the number of times an item appears in the stream is known, our algorithm provides $tildeO_eta(sqrtW)$ additive error using $tildeO_eta(sqrtW)$ space.
- Score: 61.40326886123332
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The turnstile continual release model of differential privacy captures scenarios where a privacy-preserving real-time analysis is sought for a dataset evolving through additions and deletions. In typical applications of real-time data analysis, both the length of the stream $T$ and the size of the universe $|U|$ from which data come can be extremely large. This motivates the study of private algorithms in the turnstile setting using space sublinear in both $T$ and $|U|$. In this paper, we give the first sublinear space differentially private algorithms for the fundamental problem of counting distinct elements in the turnstile streaming model. Our algorithm achieves, on arbitrary streams, $\tilde{O}_{\eta}(T^{1/3})$ space and additive error, and a $(1+\eta)$-relative approximation for all $\eta \in (0,1)$. Our result significantly improves upon the space requirements of the state-of-the-art algorithms for this problem, which is linear, approaching the known $\Omega(T^{1/4})$ additive error lower bound for arbitrary streams. Moreover, when a bound $W$ on the number of times an item appears in the stream is known, our algorithm provides $\tilde{O}_{\eta}(\sqrt{W})$ additive error, using $\tilde{O}_{\eta}(\sqrt{W})$ space. This additive error asymptotically matches that of prior work which required instead linear space. Our results address an open question posed by [Jain, Kalemaj, Raskhodnikova, Sivakumar, Smith, Neurips23] about designing low-memory mechanisms for this problem. We complement these results with a space lower bound for this problem, which shows that any algorithm that uses similar techniques must use space $\tilde{\Omega}(T^{1/3})$ on arbitrary streams.
Related papers
- Private Continual Counting of Unbounded Streams [11.941250828872189]
We study the problem of differentially private continual counting in the unbounded setting where the input size $n$ is not known in advance.<n>Using the common doubling trick' avoids knowledge of $n$ but leads to suboptimal and non-smooth error.<n>We introduce novel matrix factorizations based on logarithmic perturbations of the function $frac1sqrt1-z$ studied in prior works.
arXiv Detail & Related papers (2025-06-17T23:09:53Z) - Distribution-Independent Regression for Generalized Linear Models with
Oblivious Corruptions [49.69852011882769]
We show the first algorithms for the problem of regression for generalized linear models (GLMs) in the presence of additive oblivious noise.
We present an algorithm that tackles newthis problem in its most general distribution-independent setting.
This is the first newalgorithmic result for GLM regression newwith oblivious noise which can handle more than half the samples being arbitrarily corrupted.
arXiv Detail & Related papers (2023-09-20T21:41:59Z) - Differentially Private Clustering in Data Streams [65.78882209673885]
We present a differentially private streaming clustering framework which only requires an offline DP coreset or clustering algorithm as a blackbox.
Our framework is also differentially private under the continual release setting, i.e., the union of outputs of our algorithms at every timestamp is always differentially private.
arXiv Detail & Related papers (2023-07-14T16:11:22Z) - Sketching Algorithms and Lower Bounds for Ridge Regression [65.0720777731368]
We give a sketching-based iterative algorithm that computes $1+varepsilon$ approximate solutions for the ridge regression problem.
We also show that this algorithm can be used to give faster algorithms for kernel ridge regression.
arXiv Detail & Related papers (2022-04-13T22:18:47Z) - Sublinear Time and Space Algorithms for Correlation Clustering via
Sparse-Dense Decompositions [9.29659220237395]
We present a new approach for solving (minimum disagreement) correlation clustering.
We obtain sublinear algorithms with highly efficient time and space complexity.
The main ingredient of our approach is a novel connection to sparse-dense graph decompositions.
arXiv Detail & Related papers (2021-09-29T16:25:02Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Locally Private Hypothesis Selection [96.06118559817057]
We output a distribution from $mathcalQ$ whose total variation distance to $p$ is comparable to the best such distribution.
We show that the constraint of local differential privacy incurs an exponential increase in cost.
Our algorithms result in exponential improvements on the round complexity of previous methods.
arXiv Detail & Related papers (2020-02-21T18:30:48Z) - A Deterministic Streaming Sketch for Ridge Regression [15.256452294422294]
We provide a deterministic space-efficient algorithm for estimating ridge regression.
This is the first $o(d2)$ space deterministic streaming algorithm with guaranteed solution error.
In comparisons to randomized sketching algorithms on synthetic and real-world datasets, our algorithm has less empirical error using less space and similar time.
arXiv Detail & Related papers (2020-02-05T22:08:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.