Information-theoretic Estimation of the Risk of Privacy Leaks
- URL: http://arxiv.org/abs/2506.12328v1
- Date: Sat, 14 Jun 2025 03:39:11 GMT
- Title: Information-theoretic Estimation of the Risk of Privacy Leaks
- Authors: Kenneth Odoh,
- Abstract summary: dependencies between items in a dataset can lead to privacy leaks.<n>We measure the correlation between the original data and their noisy responses from a randomizer as an indicator of potential privacy breaches.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work~\cite{Liu2016} has shown that dependencies between items in a dataset can lead to privacy leaks. We extend this concept to privacy-preserving transformations, considering a broader set of dependencies captured by correlation metrics. Specifically, we measure the correlation between the original data and their noisy responses from a randomizer as an indicator of potential privacy breaches. This paper aims to leverage information-theoretic measures, such as the Maximal Information Coefficient (MIC), to estimate privacy leaks and derive novel, computationally efficient privacy leak estimators. We extend the $\rho_1$-to-$\rho_2$ formulation~\cite{Evfimievski2003} to incorporate entropy, mutual information, and the degree of anonymity for a more comprehensive measure of privacy risk. Our proposed hybrid metric can identify correlation dependencies between attributes in the dataset, serving as a proxy for privacy leak vulnerabilities. This metric provides a computationally efficient worst-case measure of privacy loss, utilizing the inherent characteristics of the data to prevent privacy breaches.
Related papers
- A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage [77.83757117924995]
We propose a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release.<n>Our approach shows that seemingly innocuous auxiliary information can be used to infer sensitive attributes like age or substance use history from sanitized data.
arXiv Detail & Related papers (2025-04-28T01:16:27Z) - Defining 'Good': Evaluation Framework for Synthetic Smart Meter Data [14.779917834583577]
We show that standard privacy attack methods are inadequate for assessing privacy risks of smart meter datasets.
We propose an improved method by injecting training data with implausible outliers, then launching privacy attacks directly on these outliers.
arXiv Detail & Related papers (2024-07-16T14:41:27Z) - RASE: Efficient Privacy-preserving Data Aggregation against Disclosure Attacks for IoTs [2.1765174838950494]
We study the new paradigm for collecting and protecting the data produced by ever-increasing sensor devices.
Most previous studies on co-design of data aggregation and privacy preservation assume that a trusted fusion center adheres to privacy regimes.
We propose a novel paradigm (called RASE), which can be generalized into a 3-step sequential procedure, noise addition, followed by random permutation, and then parameter estimation.
arXiv Detail & Related papers (2024-05-31T15:21:38Z) - A Summary of Privacy-Preserving Data Publishing in the Local Setting [0.6749750044497732]
Statistical Disclosure Control aims to minimize the risk of exposing confidential information by de-identifying it.
We outline the current privacy-preserving techniques employed in microdata de-identification, delve into privacy measures tailored for various disclosure scenarios, and assess metrics for information loss and predictive performance.
arXiv Detail & Related papers (2023-12-19T04:23:23Z) - Initialization Matters: Privacy-Utility Analysis of Overparameterized
Neural Networks [72.51255282371805]
We prove a privacy bound for the KL divergence between model distributions on worst-case neighboring datasets.
We find that this KL privacy bound is largely determined by the expected squared gradient norm relative to model parameters during training.
arXiv Detail & Related papers (2023-10-31T16:13:22Z) - $\alpha$-Mutual Information: A Tunable Privacy Measure for Privacy
Protection in Data Sharing [4.475091558538915]
This paper adopts Arimoto's $alpha$-Mutual Information as a tunable privacy measure.
We formulate a general distortion-based mechanism that manipulates the original data to offer privacy protection.
arXiv Detail & Related papers (2023-10-27T16:26:14Z) - On the Query Complexity of Training Data Reconstruction in Private
Learning [0.0]
We analyze the number of queries that a whitebox adversary needs to make to a private learner in order to reconstruct its training data.
For $(epsilon, delta)$ DP learners with training data drawn from any arbitrary compact metric space, we provide the emphfirst known lower bounds on the adversary's query complexity.
arXiv Detail & Related papers (2023-03-29T00:49:38Z) - Breaking the Communication-Privacy-Accuracy Tradeoff with
$f$-Differential Privacy [51.11280118806893]
We consider a federated data analytics problem in which a server coordinates the collaborative data analysis of multiple users with privacy concerns and limited communication capability.
We study the local differential privacy guarantees of discrete-valued mechanisms with finite output space through the lens of $f$-differential privacy (DP)
More specifically, we advance the existing literature by deriving tight $f$-DP guarantees for a variety of discrete-valued mechanisms.
arXiv Detail & Related papers (2023-02-19T16:58:53Z) - How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss.
We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z) - Robustness Threats of Differential Privacy [70.818129585404]
We experimentally demonstrate that networks, trained with differential privacy, in some settings might be even more vulnerable in comparison to non-private versions.
We study how the main ingredients of differentially private neural networks training, such as gradient clipping and noise addition, affect the robustness of the model.
arXiv Detail & Related papers (2020-12-14T18:59:24Z) - Deep Directed Information-Based Learning for Privacy-Preserving Smart
Meter Data Release [30.409342804445306]
We study the problem in the context of time series data and smart meters (SMs) power consumption measurements.
We introduce the Directed Information (DI) as a more meaningful measure of privacy in the considered setting.
Our empirical studies on real-world data sets from SMs measurements in the worst-case scenario show the existing trade-offs between privacy and utility.
arXiv Detail & Related papers (2020-11-20T13:41:11Z) - Graph-Homomorphic Perturbations for Private Decentralized Learning [64.26238893241322]
Local exchange of estimates allows inference of data based on private data.
perturbations chosen independently at every agent, resulting in a significant performance loss.
We propose an alternative scheme, which constructs perturbations according to a particular nullspace condition, allowing them to be invisible.
arXiv Detail & Related papers (2020-10-23T10:35:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.