Related papers: Towards a Data Privacy-Predictive Performance Trade-off

Towards a Data Privacy-Predictive Performance Trade-off

URL: http://arxiv.org/abs/2201.05226v1
Date: Thu, 13 Jan 2022 21:48:51 GMT
Title: Towards a Data Privacy-Predictive Performance Trade-off
Authors: T\^ania Carvalho, Nuno Moniz, Pedro Faria and Lu\'is Antunes
Abstract summary: We evaluate the existence of a trade-off between data privacy and predictive performance in classification tasks. Unlike previous literature, we confirm that the higher the level of privacy, the higher the impact on predictive performance.
Score: 2.580765958706854
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Machine learning is increasingly used in the most diverse applications and domains, whether in healthcare, to predict pathologies, or in the financial sector to detect fraud. One of the linchpins for efficiency and accuracy in machine learning is data utility. However, when it contains personal information, full access may be restricted due to laws and regulations aiming to protect individuals' privacy. Therefore, data owners must ensure that any data shared guarantees such privacy. Removal or transformation of private information (de-identification) are among the most common techniques. Intuitively, one can anticipate that reducing detail or distorting information would result in losses for model predictive performance. However, previous work concerning classification tasks using de-identified data generally demonstrates that predictive performance can be preserved in specific applications. In this paper, we aim to evaluate the existence of a trade-off between data privacy and predictive performance in classification tasks. We leverage a large set of privacy-preserving techniques and learning algorithms to provide an assessment of re-identification ability and the impact of transformed variants on predictive performance. Unlike previous literature, we confirm that the higher the level of privacy (lower re-identification risk), the higher the impact on predictive performance, pointing towards clear evidence of a trade-off.

Related papers

A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage [77.83757117924995]
We propose a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release. Our approach shows that seemingly innocuous auxiliary information can be used to infer sensitive attributes like age or substance use history from sanitized data.
arXiv Detail & Related papers (2025-04-28T01:16:27Z)
Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner. Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z)
Collection, usage and privacy of mobility data in the enterprise and public administrations [55.2480439325792]
Security measures such as anonymization are needed to protect individuals' privacy. Within our study, we conducted expert interviews to gain insights into practices in the field. We survey privacy-enhancing methods in use, which generally do not comply with state-of-the-art standards of differential privacy.
arXiv Detail & Related papers (2024-07-04T08:29:27Z)
Footprints of Data in a Classifier: Understanding the Privacy Risks and Solution Strategies [0.9208007322096533]
Article 17 of the General Data Protection Regulation (Right Erasure) requires data to be permanently removed from a system to prevent potential compromise. One such issue arises from the residual footprints of training data embedded within predictive models. This study examines how two fundamental aspects of classifier systems - training quality and classifier training methodology - contribute to privacy vulnerabilities.
arXiv Detail & Related papers (2024-07-02T13:56:37Z)
A Summary of Privacy-Preserving Data Publishing in the Local Setting [0.6749750044497732]
Statistical Disclosure Control aims to minimize the risk of exposing confidential information by de-identifying it. We outline the current privacy-preserving techniques employed in microdata de-identification, delve into privacy measures tailored for various disclosure scenarios, and assess metrics for information loss and predictive performance.
arXiv Detail & Related papers (2023-12-19T04:23:23Z)
$\alpha$-Mutual Information: A Tunable Privacy Measure for Privacy Protection in Data Sharing [4.475091558538915]
This paper adopts Arimoto's $alpha$-Mutual Information as a tunable privacy measure. We formulate a general distortion-based mechanism that manipulates the original data to offer privacy protection.
arXiv Detail & Related papers (2023-10-27T16:26:14Z)
Tight Auditing of Differentially Private Machine Learning [77.38590306275877]
For private machine learning, existing auditing mechanisms are tight. They only give tight estimates under implausible worst-case assumptions. We design an improved auditing scheme that yields tight privacy estimates for natural (not adversarially crafted) datasets.
arXiv Detail & Related papers (2023-02-15T21:40:33Z)
Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining [75.25943383604266]
We question whether the use of large Web-scraped datasets should be viewed as differential-privacy-preserving. We caution that publicizing these models pretrained on Web data as "private" could lead to harm and erode the public's trust in differential privacy as a meaningful definition of privacy. We conclude by discussing potential paths forward for the field of private learning, as public pretraining becomes more popular and powerful.
arXiv Detail & Related papers (2022-12-13T10:41:12Z)
A General Framework for Auditing Differentially Private Machine Learning [27.99806936918949]
We present a framework to statistically audit the privacy guarantee conferred by a differentially private machine learner in practice. Our work develops a general methodology to empirically evaluate the privacy of differentially private machine learning implementations.
arXiv Detail & Related papers (2022-10-16T21:34:18Z)
When Fairness Meets Privacy: Fair Classification with Semi-Private Sensitive Attributes [18.221858247218726]
We study a novel and practical problem of fair classification in a semi-private setting. Most of the sensitive attributes are private and only a small amount of clean ones are available. We propose a novel framework FairSP that can achieve Fair prediction under the Semi-Private setting.
arXiv Detail & Related papers (2022-07-18T01:10:25Z)
SF-PATE: Scalable, Fair, and Private Aggregation of Teacher Ensembles [50.90773979394264]
This paper studies a model that protects the privacy of individuals' sensitive information while also allowing it to learn non-discriminatory predictors. A key characteristic of the proposed model is to enable the adoption of off-the-selves and non-private fair models to create a privacy-preserving and fair model.
arXiv Detail & Related papers (2022-04-11T14:42:54Z)
Differentially Private and Fair Deep Learning: A Lagrangian Dual Approach [54.32266555843765]
This paper studies a model that protects the privacy of the individuals sensitive information while also allowing it to learn non-discriminatory predictors. The method relies on the notion of differential privacy and the use of Lagrangian duality to design neural networks that can accommodate fairness constraints.
arXiv Detail & Related papers (2020-09-26T10:50:33Z)
Utility-aware Privacy-preserving Data Releasing [7.462336024223669]
We propose a two-step perturbation-based privacy-preserving data releasing framework. First, certain predefined privacy and utility problems are learned from the public domain data. We then leverage the learned knowledge to precisely perturb the data owners' data into privatized data.
arXiv Detail & Related papers (2020-05-09T05:32:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.