An Iconic Heavy Hitter Algorithm Made Private
- URL: http://arxiv.org/abs/2512.17295v1
- Date: Fri, 19 Dec 2025 07:20:37 GMT
- Title: An Iconic Heavy Hitter Algorithm Made Private
- Authors: Rayne Holland,
- Abstract summary: Identifying heavy hitters in data streams is a fundamental problem with widespread applications in modern analytics systems.<n>We present the first differentially private variant of the SpaceSaving algorithm, which is regarded as the state-of-the-art in practice.<n>We introduce a generic method for extracting heavy hitters from any differentially private frequency oracle in the data stream model.
- Score: 0.45357961994031076
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Identifying heavy hitters in data streams is a fundamental problem with widespread applications in modern analytics systems. These streams are often derived from sensitive user activity, making update-level privacy guarantees necessary. While recent work has adapted the classical heavy hitter algorithm Misra-Gries to satisfy differential privacy in the streaming model, the privatization of other heavy hitter algorithms with better empirical utility is absent. Under this observation, we present the first differentially private variant of the SpaceSaving algorithm, which, in the non-private setting, is regarded as the state-of-the-art in practice. Our construction post-processes a non-private SpaceSaving summary by injecting asymptotically optimal noise and applying a carefully calibrated selection rule that suppresses unstable labels. This yields strong privacy guarantees while preserving the empirical advantages of SpaceSaving. Second, we introduce a generic method for extracting heavy hitters from any differentially private frequency oracle in the data stream model. The method requires only O(k) additional memory, where k is the number of heavy items, and provides a mechanism for safely releasing item identities from noisy frequency estimates. This yields an efficient, plug-and-play approach for private heavy hitter recovery from linear sketches. Finally, we conduct an experimental evaluation on synthetic and real-world datasets. Across a wide range of privacy parameters and space budgets, our method provides superior utility to the existing differentially private Misra-Gries algorithm. Our results demonstrate that the empirical superiority of SpaceSaving survives privatization and that efficient, practical heavy hitter identification is achievable under strong differential privacy guarantees.
Related papers
- Keeping a Secret Requires a Good Memory: Space Lower-Bounds for Private Algorithms [67.94856074923571]
This paper introduces a novel proof technique based on a multi-player communication game.<n>We show that winning this communication game requires transmitting information proportional to the number of over-active users.<n>We show that this communication-theoretic technique generalizes to broad classes of problems, yielding lower bounds for private medians, quantiles, and max-select.
arXiv Detail & Related papers (2026-02-12T17:49:07Z) - RPWithPrior: Label Differential Privacy in Regression [0.0]
In this paper, we focus on regression tasks under $$-label differential privacy guarantees.<n>We model both original and randomized responses as continuous random variables, avoiding discretization entirely.<n>We prove that our algorithm, RPWithPrior, guarantees $$-label differential privacy.
arXiv Detail & Related papers (2026-01-30T06:27:13Z) - Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression [22.733602577854825]
We study instrumental variable regression (IVaR) under differential privacy constraints.<n>We propose a noisy two-state gradient descent algorithm that ensures differential privacy by injecting carefully calibrated noise into the gradient updates.
arXiv Detail & Related papers (2025-09-26T18:02:58Z) - Linear-Time User-Level DP-SCO via Robust Statistics [55.350093142673316]
User-level differentially private convex optimization (DP-SCO) has garnered significant attention due to the importance of safeguarding user privacy in machine learning applications.<n>Current methods, such as those based on differentially private gradient descent (DP-SGD), often struggle with high noise accumulation and suboptimal utility.<n>We introduce a novel linear-time algorithm that leverages robust statistics, specifically the median and trimmed mean, to overcome these challenges.
arXiv Detail & Related papers (2025-02-13T02:05:45Z) - Differentially Private Random Feature Model [47.35176457481132]
We produce a differentially private random feature model for privacy-preserving kernel machines.<n>We show that our method preserves privacy and derive a generalization error bound for the method.
arXiv Detail & Related papers (2024-12-06T05:31:08Z) - Differentially private and decentralized randomized power method [15.955127242261808]
This paper proposes enhanced privacy-preserving variants of the randomized power method.<n>First, we propose a variant that reduces the amount of the noise required in current techniques to achieve Differential Privacy.<n>Second, we adapt our method to a decentralized framework in which data is distributed among multiple users.
arXiv Detail & Related papers (2024-11-04T09:53:03Z) - On Differential Privacy and Adaptive Data Analysis with Bounded Space [76.10334958368618]
We study the space complexity of the two related fields of differential privacy and adaptive data analysis.
We show that there exists a problem P that requires exponentially more space to be solved efficiently with differential privacy.
The line of work on adaptive data analysis focuses on understanding the number of samples needed for answering a sequence of adaptive queries.
arXiv Detail & Related papers (2023-02-11T14:45:31Z) - Differentially Private Stochastic Gradient Descent with Low-Noise [49.981789906200035]
Modern machine learning algorithms aim to extract fine-grained information from data to provide accurate predictions, which often conflicts with the goal of privacy protection.
This paper addresses the practical and theoretical importance of developing privacy-preserving machine learning algorithms that ensure good performance while preserving privacy.
arXiv Detail & Related papers (2022-09-09T08:54:13Z) - Smooth Anonymity for Sparse Graphs [69.1048938123063]
differential privacy has emerged as the gold standard of privacy, however, when it comes to sharing sparse datasets.
In this work, we consider a variation of $k$-anonymity, which we call smooth-$k$-anonymity, and design simple large-scale algorithms that efficiently provide smooth-$k$-anonymity.
arXiv Detail & Related papers (2022-07-13T17:09:25Z) - Private measures, random walks, and synthetic data [7.5764890276775665]
Differential privacy is a mathematical concept that provides an information-theoretic security guarantee.
We develop a private measure from a data set that allows us to efficiently construct private synthetic data.
A key ingredient in our construction is a new superregular random walk, whose joint distribution of steps is as regular as that of independent random variables.
arXiv Detail & Related papers (2022-04-20T00:06:52Z) - Robust and Differentially Private Mean Estimation [40.323756738056616]
Differential privacy has emerged as a standard requirement in a variety of applications ranging from the U.S. Census to data collected in commercial devices.
An increasing number of such databases consist of data from multiple sources, not all of which can be trusted.
This leaves existing private analyses vulnerable to attacks by an adversary who injects corrupted data.
arXiv Detail & Related papers (2021-02-18T05:02:49Z) - Graph-Homomorphic Perturbations for Private Decentralized Learning [64.26238893241322]
Local exchange of estimates allows inference of data based on private data.
perturbations chosen independently at every agent, resulting in a significant performance loss.
We propose an alternative scheme, which constructs perturbations according to a particular nullspace condition, allowing them to be invisible.
arXiv Detail & Related papers (2020-10-23T10:35:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.