Related papers: Privacy Constrained Fairness Estimation for Decision Trees

Privacy Constrained Fairness Estimation for Decision Trees

URL: http://arxiv.org/abs/2312.08413v1
Date: Wed, 13 Dec 2023 14:54:48 GMT
Title: Privacy Constrained Fairness Estimation for Decision Trees
Authors: Florian van der Steen, Fr\'e Vink and Heysem Kaya
Abstract summary: Measuring the fairness of any AI model requires the sensitive attributes of the individuals in the dataset. We propose a novel method, dubbed Privacy-Aware Fairness Estimation of Rules (PAFER) We show that using the Laplacian mechanism, the method is able to estimate SP with low error while guaranteeing the privacy of the individuals in the dataset with high certainty.
Score: 2.9906966931843093
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The protection of sensitive data becomes more vital, as data increases in value and potency. Furthermore, the pressure increases from regulators and society on model developers to make their Artificial Intelligence (AI) models non-discriminatory. To boot, there is a need for interpretable, transparent AI models for high-stakes tasks. In general, measuring the fairness of any AI model requires the sensitive attributes of the individuals in the dataset, thus raising privacy concerns. In this work, the trade-offs between fairness, privacy and interpretability are further explored. We specifically examine the Statistical Parity (SP) of Decision Trees (DTs) with Differential Privacy (DP), that are each popular methods in their respective subfield. We propose a novel method, dubbed Privacy-Aware Fairness Estimation of Rules (PAFER), that can estimate SP in a DP-aware manner for DTs. DP, making use of a third-party legal entity that securely holds this sensitive data, guarantees privacy by adding noise to the sensitive data. We experimentally compare several DP mechanisms. We show that using the Laplacian mechanism, the method is able to estimate SP with low error while guaranteeing the privacy of the individuals in the dataset with high certainty. We further show experimentally and theoretically that the method performs better for DTs that humans generally find easier to interpret.

Related papers

Machine Learning with Privacy for Protected Attributes [56.44253915927481]
We refine the definition of differential privacy (DP) to create a more general and flexible framework that we call feature differential privacy (FDP)<n>Our definition is simulation-based and allows for both addition/removal and replacement variants of privacy, and can handle arbitrary separation of protected and non-protected features.<n>We apply our framework to various machine learning tasks and show that it can significantly improve the utility of DP-trained models when public features are available.
arXiv Detail & Related papers (2025-06-24T17:53:28Z)
DP-CDA: An Algorithm for Enhanced Privacy Preservation in Dataset Synthesis Through Randomized Mixing [0.8739101659113155]
We introduce an effective data publishing algorithm emphDP-CDA. Our proposed algorithm generates synthetic datasets by randomly mixing data in a class-specific manner, and inducing carefully-tuned randomness to ensure privacy guarantees. Our results indicate that synthetic datasets produced using the DP-CDA can achieve superior utility compared to those generated by traditional data publishing algorithms, even when subject to the same privacy requirements.
arXiv Detail & Related papers (2024-11-25T06:14:06Z)
Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner. Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z)
Noisy Data Meets Privacy: Training Local Models with Post-Processed Remote Queries [7.993286956508782]
LDPKiT generates a privacy-preserving inference dataset aligned with private data distribution. Experiments on Fashion-MNIST, SVHN and PathMNIST medical datasets demonstrate that LDPKiT effectively improves utility while preserving privacy.
arXiv Detail & Related papers (2024-05-25T21:53:58Z)
Differential Privacy for Anomaly Detection: Analyzing the Trade-off Between Privacy and Explainability [4.844901225743574]
We exploit the trade-off of applying Explainable AI (XAI) through SHapley Additive exPlanations (SHAP) and differential privacy (DP) Our results show that the enforcement of privacy through DP has a significant impact on detection accuracy and explainability. We further show that the visual interpretation of explanations is also influenced by the choice of the AD algorithm.
arXiv Detail & Related papers (2024-04-09T09:09:36Z)
Conciliating Privacy and Utility in Data Releases via Individual Differential Privacy and Microaggregation [4.287502453001108]
$epsilon$-Differential privacy (DP) is a well-known privacy model that offers strong privacy guarantees. We propose $epsilon$-individual differential privacy (iDP), which causes less data distortion while providing the same protection as DP to subjects. We report on experiments that show how our approach can provide strong privacy (small $epsilon$) while yielding protected data that do not significantly degrade the accuracy of secondary data analysis.
arXiv Detail & Related papers (2023-12-21T10:23:18Z)
Reconciling AI Performance and Data Reconstruction Resilience for Medical Imaging [52.578054703818125]
Artificial Intelligence (AI) models are vulnerable to information leakage of their training data, which can be highly sensitive. Differential Privacy (DP) aims to circumvent these susceptibilities by setting a quantifiable privacy budget. We show that using very large privacy budgets can render reconstruction attacks impossible, while drops in performance are negligible.
arXiv Detail & Related papers (2023-12-05T12:21:30Z)
TeD-SPAD: Temporal Distinctiveness for Self-supervised Privacy-preservation for video Anomaly Detection [59.04634695294402]
Video anomaly detection (VAD) without human monitoring is a complex computer vision task. Privacy leakage in VAD allows models to pick up and amplify unnecessary biases related to people's personal information. We propose TeD-SPAD, a privacy-aware video anomaly detection framework that destroys visual private information in a self-supervised manner.
arXiv Detail & Related papers (2023-08-21T22:42:55Z)
A Randomized Approach for Tight Privacy Accounting [63.67296945525791]
We propose a new differential privacy paradigm called estimate-verify-release (EVR) EVR paradigm first estimates the privacy parameter of a mechanism, then verifies whether it meets this guarantee, and finally releases the query output. Our empirical evaluation shows the newly proposed EVR paradigm improves the utility-privacy tradeoff for privacy-preserving machine learning.
arXiv Detail & Related papers (2023-04-17T00:38:01Z)
How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss. We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z)
Just Fine-tune Twice: Selective Differential Privacy for Large Language Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models. Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z)
DTGAN: Differential Private Training for Tabular GANs [6.174448419090292]
We propose DTGAN, a novel conditional Wasserstein GAN that comes in two variants DTGAN_G and DTGAN_D. We rigorously evaluate the theoretical privacy guarantees offered by DP empirically against membership and attribute inference attacks. Our results on 3 datasets show that the DP-SGD framework is superior to PATE and that a DP discriminator is more optimal for training convergence.
arXiv Detail & Related papers (2021-07-06T10:28:05Z)
P3GM: Private High-Dimensional Data Release via Privacy Preserving Phased Generative Model [23.91327154831855]
This paper proposes privacy-preserving phased generative model (P3GM) for releasing sensitive data. P3GM employs the two-phase learning process to make it robust against the noise, and to increase learning efficiency. Compared with the state-of-the-art methods, our generated samples look fewer noises and closer to the original data in terms of data diversity.
arXiv Detail & Related papers (2020-06-22T09:47:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.