Local Differentially Private Heavy Hitter Detection in Data Streams with Bounded Memory
- URL: http://arxiv.org/abs/2311.16062v1
- Date: Mon, 27 Nov 2023 18:28:15 GMT
- Title: Local Differentially Private Heavy Hitter Detection in Data Streams with Bounded Memory
- Authors: Xiaochen Li, Weiran Liu, Jian Lou, Yuan Hong, Lei Zhang, Zhan Qin, Kui Ren,
- Abstract summary: We present a novel framework HG-LDP to achieve accurate Top-$k$ item detection at bounded memory expense, while providing rigorous local differential privacy (LDP) protection.
We conduct comprehensive experiments on both synthetic and real-world datasets to show that the proposed advanced schemes achieve a superior accuracy-privacy-memory efficiency'' tradeoff.
- Score: 31.652076018162507
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Top-$k$ frequent items detection is a fundamental task in data stream mining. Many promising solutions are proposed to improve memory efficiency while still maintaining high accuracy for detecting the Top-$k$ items. Despite the memory efficiency concern, the users could suffer from privacy loss if participating in the task without proper protection, since their contributed local data streams may continually leak sensitive individual information. However, most existing works solely focus on addressing either the memory-efficiency problem or the privacy concerns but seldom jointly, which cannot achieve a satisfactory tradeoff between memory efficiency, privacy protection, and detection accuracy. In this paper, we present a novel framework HG-LDP to achieve accurate Top-$k$ item detection at bounded memory expense, while providing rigorous local differential privacy (LDP) protection. Specifically, we identify two key challenges naturally arising in the task, which reveal that directly applying existing LDP techniques will lead to an inferior ``accuracy-privacy-memory efficiency'' tradeoff. Therefore, we instantiate three advanced schemes under the framework by designing novel LDP randomization methods, which address the hurdles caused by the large size of the item domain and by the limited space of the memory. We conduct comprehensive experiments on both synthetic and real-world datasets to show that the proposed advanced schemes achieve a superior ``accuracy-privacy-memory efficiency'' tradeoff, saving $2300\times$ memory over baseline methods when the item domain size is $41,270$. Our code is open-sourced via the link.
Related papers
- Keeping a Secret Requires a Good Memory: Space Lower-Bounds for Private Algorithms [67.94856074923571]
This paper introduces a novel proof technique based on a multi-player communication game.<n>We show that winning this communication game requires transmitting information proportional to the number of over-active users.<n>We show that this communication-theoretic technique generalizes to broad classes of problems, yielding lower bounds for private medians, quantiles, and max-select.
arXiv Detail & Related papers (2026-02-12T17:49:07Z) - Personalized 3D Spatiotemporal Trajectory Privacy Protection with Differential and Distortion Geo-Perturbation [64.60694805725727]
This paper proposes a personalized 3Dtemporal trajectory privacy protection mechanism named 3DSTPM.<n>We analyze the characteristics of attackers that exploit correlations between locations in a trajectory and present the attack model.<n>Results demonstrate that the proposed 3DSTPM effectively reduces loss while meeting the user's personalized privacy protection needs.
arXiv Detail & Related papers (2025-11-27T07:41:14Z) - A General Framework for Per-record Differential Privacy [10.959311645622632]
Per-record Differential Privacy (PrDP) addresses this by defining the privacy budget as a function of each record.<n>Existing solutions either handle specific privacy functions or adopt relaxed PrDP definitions.<n>We propose a general and practical framework that enables any standard DP mechanism to support PrDP.
arXiv Detail & Related papers (2025-11-24T11:44:10Z) - Privacy-Aware Decoding: Mitigating Privacy Leakage of Large Language Models in Retrieval-Augmented Generation [26.573578326262307]
Privacy-Aware Decoding (PAD) is a lightweight, inference-time defense that adaptively injects calibrated Gaussian noise into token logits during generation.<n>PAD integrates confidence-based screening to selectively protect high-risk tokens, efficient sensitivity estimation to minimize unnecessary noise, and context-aware noise calibration to balance privacy with generation quality.<n>Our work takes an important step toward mitigating privacy risks in RAG via decoding strategies, paving the way for universal and scalable privacy solutions in sensitive domains.
arXiv Detail & Related papers (2025-08-05T05:22:13Z) - Revisiting Privacy, Utility, and Efficiency Trade-offs when Fine-Tuning Large Language Models [12.635018411121413]
We study the inherent trade-offs in minimizing privacy risks and maximizing utility, while maintaining high computational efficiency.
We find that efficient fine-tuning methods like LoRA mitigate privacy risks similar to private fine-tuning methods like DP.
arXiv Detail & Related papers (2025-02-18T22:16:03Z) - Efficiently Achieving Secure Model Training and Secure Aggregation to Ensure Bidirectional Privacy-Preservation in Federated Learning [36.94596192980534]
Bidirectional privacy-preservation federated learning is crucial as both local gradients and the global model may leak privacy.
We design an efficient and high-accuracy bidirectional privacy-preserving scheme for federated learning to complete secure model training and secure aggregation.
Our scheme significantly outperforms state-of-the-art bidirectional privacy-preservation baselines in terms of computational cost, model accuracy, and defense ability.
arXiv Detail & Related papers (2024-12-16T12:58:21Z) - Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - Accelerating Privacy-Preserving Medical Record Linkage: A Three-Party MPC Approach [1.7999333451993955]
This paper presents a novel and efficient PPRL based on a secure 3-party computation framework.
We demonstrate that our method preserves the linkage quality of the state-of-the-art PPRL method while achieving up to 14 times faster performance.
arXiv Detail & Related papers (2024-10-28T23:13:01Z) - Collaborative Inference over Wireless Channels with Feature Differential Privacy [57.68286389879283]
Collaborative inference among multiple wireless edge devices has the potential to significantly enhance Artificial Intelligence (AI) applications.
transmitting extracted features poses a significant privacy risk, as sensitive personal data can be exposed during the process.
We propose a novel privacy-preserving collaborative inference mechanism, wherein each edge device in the network secures the privacy of extracted features before transmitting them to a central server for inference.
arXiv Detail & Related papers (2024-10-25T18:11:02Z) - Enhancing Feature-Specific Data Protection via Bayesian Coordinate Differential Privacy [55.357715095623554]
Local Differential Privacy (LDP) offers strong privacy guarantees without requiring users to trust external parties.
We propose a Bayesian framework, Bayesian Coordinate Differential Privacy (BCDP), that enables feature-specific privacy quantification.
arXiv Detail & Related papers (2024-10-24T03:39:55Z) - Noisy Data Meets Privacy: Training Local Models with Post-Processed Remote Queries [7.993286956508782]
LDPKiT generates a privacy-preserving inference dataset aligned with private data distribution.
Experiments on Fashion-MNIST, SVHN and PathMNIST medical datasets demonstrate that LDPKiT effectively improves utility while preserving privacy.
arXiv Detail & Related papers (2024-05-25T21:53:58Z) - Private Optimal Inventory Policy Learning for Feature-based Newsvendor with Unknown Demand [13.594765018457904]
This paper introduces a novel approach to estimate a privacy-preserving optimal inventory policy within the f-differential privacy framework.
We develop a clipped noisy gradient descent algorithm based on convolution smoothing for optimal inventory estimation.
Our numerical experiments demonstrate that the proposed new method can achieve desirable privacy protection with a marginal increase in cost.
arXiv Detail & Related papers (2024-04-23T19:15:43Z) - A Randomized Approach for Tight Privacy Accounting [63.67296945525791]
We propose a new differential privacy paradigm called estimate-verify-release (EVR)
EVR paradigm first estimates the privacy parameter of a mechanism, then verifies whether it meets this guarantee, and finally releases the query output.
Our empirical evaluation shows the newly proposed EVR paradigm improves the utility-privacy tradeoff for privacy-preserving machine learning.
arXiv Detail & Related papers (2023-04-17T00:38:01Z) - DP-Fast MH: Private, Fast, and Accurate Metropolis-Hastings for
Large-Scale Bayesian Inference [16.280801141284872]
We study the Metropolis-Hastings (MH) algorithm for large-scale Bayesian inference under differential privacy.
We reveal, for the first time, a three-way trade-off among privacy, scalability, and efficiency.
We empirically demonstrate the effectiveness and efficiency of our algorithm in various experiments.
arXiv Detail & Related papers (2023-03-10T19:14:20Z) - Production of Categorical Data Verifying Differential Privacy:
Conception and Applications to Machine Learning [0.0]
Differential privacy is a formal definition that allows quantifying the privacy-utility trade-off.
With the local DP (LDP) model, users can sanitize their data locally before transmitting it to the server.
In all cases, we concluded that differentially private ML models achieve nearly the same utility metrics as non-private ones.
arXiv Detail & Related papers (2022-04-02T12:50:14Z) - Task-aware Privacy Preservation for Multi-dimensional Data [4.138783926370621]
Local differential privacy (LDP) is a state-of-the-art technique for privacy preservation.
In the future, LDP can be adopted to anonymize richer user data attributes.
We show how to significantly improve the ultimate task performance for multi-dimensional user data by considering a task-aware privacy preservation problem.
arXiv Detail & Related papers (2021-10-05T20:03:53Z) - Learning with User-Level Privacy [61.62978104304273]
We analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints.
Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution.
We derive an algorithm that privately answers a sequence of $K$ adaptively chosen queries with privacy cost proportional to $tau$, and apply it to solve the learning tasks we consider.
arXiv Detail & Related papers (2021-02-23T18:25:13Z) - A One-Pass Private Sketch for Most Machine Learning Tasks [48.17461258268463]
Differential privacy (DP) is a compelling privacy definition that explains the privacy-utility tradeoff via formal, provable guarantees.
We propose a private sketch that supports a multitude of machine learning tasks including regression, classification, density estimation, and more.
Our sketch consists of randomized contingency tables that are indexed with locality-sensitive hashing and constructed with an efficient one-pass algorithm.
arXiv Detail & Related papers (2020-06-16T17:47:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.