Attribute Privacy: Framework and Mechanisms
- URL: http://arxiv.org/abs/2009.04013v2
- Date: Tue, 11 May 2021 23:23:04 GMT
- Title: Attribute Privacy: Framework and Mechanisms
- Authors: Wanrong Zhang, Olga Ohrimenko, Rachel Cummings
- Abstract summary: We study the study of attribute privacy, where a data owner is concerned about revealing sensitive properties of a whole dataset during analysis.
We propose definitions to capture emphattribute privacy in two relevant cases where global attributes may need to be protected.
We provide two efficient mechanisms and one inefficient mechanism that satisfy attribute privacy for these settings.
- Score: 26.233612860653025
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ensuring the privacy of training data is a growing concern since many machine
learning models are trained on confidential and potentially sensitive data.
Much attention has been devoted to methods for protecting individual privacy
during analyses of large datasets. However in many settings, global properties
of the dataset may also be sensitive (e.g., mortality rate in a hospital rather
than presence of a particular patient in the dataset). In this work, we depart
from individual privacy to initiate the study of attribute privacy, where a
data owner is concerned about revealing sensitive properties of a whole dataset
during analysis. We propose definitions to capture \emph{attribute privacy} in
two relevant cases where global attributes may need to be protected: (1)
properties of a specific dataset and (2) parameters of the underlying
distribution from which dataset is sampled. We also provide two efficient
mechanisms and one inefficient mechanism that satisfy attribute privacy for
these settings. We base our results on a novel use of the Pufferfish framework
to account for correlations across attributes in the data, thus addressing "the
challenging problem of developing Pufferfish instantiations and algorithms for
general aggregate secrets" that was left open by \cite{kifer2014pufferfish}.
Related papers
- MaSS: Multi-attribute Selective Suppression for Utility-preserving Data Transformation from an Information-theoretic Perspective [10.009178591853058]
We propose a formal information-theoretic definition for this utility-preserving privacy protection problem.
We design a data-driven learnable data transformation framework that is capable of suppressing sensitive attributes from target datasets.
Results demonstrate the effectiveness and generalizability of our method under various configurations.
arXiv Detail & Related papers (2024-05-23T18:35:46Z) - Synergizing Privacy and Utility in Data Analytics Through Advanced Information Theorization [2.28438857884398]
We introduce three sophisticated algorithms: a Noise-Infusion Technique tailored for high-dimensional image data, a Variational Autoencoder (VAE) for robust feature extraction and an Expectation Maximization (EM) approach optimized for structured data privacy.
Our methods significantly reduce mutual information between sensitive attributes and transformed data, thereby enhancing privacy.
The research contributes to the field by providing a flexible and effective strategy for deploying privacy-preserving algorithms across various data types.
arXiv Detail & Related papers (2024-04-24T22:58:42Z) - Privacy-Optimized Randomized Response for Sharing Multi-Attribute Data [1.1510009152620668]
We propose a privacy-optimized randomized response that guarantees the strongest privacy in sharing multi-attribute data.
We also present an efficient algorithm for constructing a near-optimal attribute mechanism.
Our methods provide significantly stronger privacy guarantees for the entire dataset than the existing method.
arXiv Detail & Related papers (2024-02-12T11:34:42Z) - How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss.
We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z) - Algorithms with More Granular Differential Privacy Guarantees [65.3684804101664]
We consider partial differential privacy (DP), which allows quantifying the privacy guarantee on a per-attribute basis.
In this work, we study several basic data analysis and learning tasks, and design algorithms whose per-attribute privacy parameter is smaller that the best possible privacy parameter for the entire record of a person.
arXiv Detail & Related papers (2022-09-08T22:43:50Z) - DP2-Pub: Differentially Private High-Dimensional Data Publication with
Invariant Post Randomization [58.155151571362914]
We propose a differentially private high-dimensional data publication mechanism (DP2-Pub) that runs in two phases.
splitting attributes into several low-dimensional clusters with high intra-cluster cohesion and low inter-cluster coupling helps obtain a reasonable privacy budget.
We also extend our DP2-Pub mechanism to the scenario with a semi-honest server which satisfies local differential privacy.
arXiv Detail & Related papers (2022-08-24T17:52:43Z) - Protecting Global Properties of Datasets with Distribution Privacy
Mechanisms [8.19841678851784]
We show how a distribution privacy framework can be applied to formalize such data confidentiality.
We then empirically evaluate the privacy-utility tradeoffs of these mechanisms and apply them against a practical property inference attack.
arXiv Detail & Related papers (2022-07-18T03:54:38Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - Partial sensitivity analysis in differential privacy [58.730520380312676]
We investigate the impact of each input feature on the individual's privacy loss.
We experimentally evaluate our approach on queries over private databases.
We also explore our findings in the context of neural network training on synthetic data.
arXiv Detail & Related papers (2021-09-22T08:29:16Z) - Graph-Homomorphic Perturbations for Private Decentralized Learning [64.26238893241322]
Local exchange of estimates allows inference of data based on private data.
perturbations chosen independently at every agent, resulting in a significant performance loss.
We propose an alternative scheme, which constructs perturbations according to a particular nullspace condition, allowing them to be invisible.
arXiv Detail & Related papers (2020-10-23T10:35:35Z) - Privacy-Preserving Public Release of Datasets for Support Vector Machine
Classification [14.095523601311374]
We consider the problem of publicly releasing a dataset for support vector machine classification while not infringing on the privacy of data subjects.
The dataset is systematically obfuscated using an additive noise for privacy protection.
Conditions are established for ensuring that the classifier extracted from the original dataset and the obfuscated one are close to each other.
arXiv Detail & Related papers (2019-12-29T03:32:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.