Quantitative Auditing of AI Fairness with Differentially Private Synthetic Data
- URL: http://arxiv.org/abs/2504.21634v1
- Date: Wed, 30 Apr 2025 13:36:27 GMT
- Title: Quantitative Auditing of AI Fairness with Differentially Private Synthetic Data
- Authors: Chih-Cheng Rex Yuan, Bow-Yaw Wang,
- Abstract summary: Fairness auditing of AI systems can identify and quantify biases.<n>Traditional auditing using real-world data raises security and privacy concerns.<n>We propose a framework that leverages differentially private synthetic data to audit the fairness of AI systems.
- Score: 0.30693357740321775
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Fairness auditing of AI systems can identify and quantify biases. However, traditional auditing using real-world data raises security and privacy concerns. It exposes auditors to security risks as they become custodians of sensitive information and targets for cyberattacks. Privacy risks arise even without direct breaches, as data analyses can inadvertently expose confidential information. To address these, we propose a framework that leverages differentially private synthetic data to audit the fairness of AI systems. By applying privacy-preserving mechanisms, it generates synthetic data that mirrors the statistical properties of the original dataset while ensuring privacy. This method balances the goal of rigorous fairness auditing and the need for strong privacy protections. Through experiments on real datasets like Adult, COMPAS, and Diabetes, we compare fairness metrics of synthetic and real data. By analyzing the alignment and discrepancies between these metrics, we assess the capacity of synthetic data to preserve the fairness properties of real data. Our results demonstrate the framework's ability to enable meaningful fairness evaluations while safeguarding sensitive information, proving its applicability across critical and sensitive domains.
Related papers
- A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage [77.83757117924995]
We propose a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release.
Our approach shows that seemingly innocuous auxiliary information can be used to infer sensitive attributes like age or substance use history from sanitized data.
arXiv Detail & Related papers (2025-04-28T01:16:27Z) - A Consensus Privacy Metrics Framework for Synthetic Data [13.972528788909813]
There is no consolidated standard for measuring privacy in synthetic data.<n>Our findings indicate that current similarity metrics fail to measure identity disclosure.<n>For differentially private synthetic data, a privacy budget other than close to zero was not considered interpretable.
arXiv Detail & Related papers (2025-03-06T21:19:02Z) - Defining 'Good': Evaluation Framework for Synthetic Smart Meter Data [14.779917834583577]
We show that standard privacy attack methods are inadequate for assessing privacy risks of smart meter datasets.
We propose an improved method by injecting training data with implausible outliers, then launching privacy attacks directly on these outliers.
arXiv Detail & Related papers (2024-07-16T14:41:27Z) - The Use of Synthetic Data to Train AI Models: Opportunities and Risks
for Sustainable Development [0.6906005491572401]
This paper investigates the policies governing the creation, utilization, and dissemination of synthetic data.
A well crafted synthetic data policy must strike a balance between privacy concerns and the utility of data.
arXiv Detail & Related papers (2023-08-31T23:18:53Z) - Auditing and Generating Synthetic Data with Controllable Trust Trade-offs [54.262044436203965]
We introduce a holistic auditing framework that comprehensively evaluates synthetic datasets and AI models.
It focuses on preventing bias and discrimination, ensures fidelity to the source data, assesses utility, robustness, and privacy preservation.
We demonstrate the framework's effectiveness by auditing various generative models across diverse use cases.
arXiv Detail & Related papers (2023-04-21T09:03:18Z) - No Free Lunch in "Privacy for Free: How does Dataset Condensation Help
Privacy" [75.98836424725437]
New methods designed to preserve data privacy require careful scrutiny.
Failure to preserve privacy is hard to detect, and yet can lead to catastrophic results when a system implementing a privacy-preserving'' method is attacked.
arXiv Detail & Related papers (2022-09-29T17:50:23Z) - Representative & Fair Synthetic Data [68.8204255655161]
We present a framework to incorporate fairness constraints into the self-supervised learning process.
We generate a representative as well as fair version of the UCI Adult census data set.
We consider representative & fair synthetic data a promising future building block to teach algorithms not on historic worlds, but rather on the worlds that we strive to live in.
arXiv Detail & Related papers (2021-04-07T09:19:46Z) - Robustness Threats of Differential Privacy [70.818129585404]
We experimentally demonstrate that networks, trained with differential privacy, in some settings might be even more vulnerable in comparison to non-private versions.
We study how the main ingredients of differentially private neural networks training, such as gradient clipping and noise addition, affect the robustness of the model.
arXiv Detail & Related papers (2020-12-14T18:59:24Z) - Trustworthy AI [75.99046162669997]
Brittleness to minor adversarial changes in the input data, ability to explain the decisions, address the bias in their training data, are some of the most prominent limitations.
We propose the tutorial on Trustworthy AI to address six critical issues in enhancing user and public trust in AI systems.
arXiv Detail & Related papers (2020-11-02T20:04:18Z) - Really Useful Synthetic Data -- A Framework to Evaluate the Quality of
Differentially Private Synthetic Data [2.538209532048867]
Recent advances in generating synthetic data that allow to add principled ways of protecting privacy are a crucial step in sharing statistical information in a privacy preserving way.
To further optimise the inherent trade-off between data privacy and data quality, it is necessary to think closely about the latter.
We develop a framework to evaluate the quality of differentially private synthetic data from an applied researcher's perspective.
arXiv Detail & Related papers (2020-04-16T16:24:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.