Downstream Fairness Caveats with Synthetic Healthcare Data
- URL: http://arxiv.org/abs/2203.04462v1
- Date: Wed, 9 Mar 2022 00:52:47 GMT
- Title: Downstream Fairness Caveats with Synthetic Healthcare Data
- Authors: Karan Bhanot, Ioana Baldini, Dennis Wei, Jiaming Zeng and Kristin P.
Bennett
- Abstract summary: Privacy laws limit access to health data such as Electronic Medical Records (EMRs) to preserve patient privacy.
This paper evaluates synthetically generated healthcare data for biases and investigates the effect of fairness mitigation techniques on utility-fairness.
- Score: 21.54509987309669
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper evaluates synthetically generated healthcare data for biases and
investigates the effect of fairness mitigation techniques on utility-fairness.
Privacy laws limit access to health data such as Electronic Medical Records
(EMRs) to preserve patient privacy. Albeit essential, these laws hinder
research reproducibility. Synthetic data is a viable solution that can enable
access to data similar to real healthcare data without privacy risks.
Healthcare datasets may have biases in which certain protected groups might
experience worse outcomes than others. With the real data having biases, the
fairness of synthetically generated health data comes into question. In this
paper, we evaluate the fairness of models generated on two healthcare datasets
for gender and race biases. We generate synthetic versions of the dataset using
a Generative Adversarial Network called HealthGAN, and compare the real and
synthetic model's balanced accuracy and fairness scores. We find that synthetic
data has different fairness properties compared to real data and fairness
mitigation techniques perform differently, highlighting that synthetic data is
not bias free.
Related papers
- Can Synthetic Data be Fair and Private? A Comparative Study of Synthetic Data Generation and Fairness Algorithms [2.144088660722956]
We find that the DEbiasing CAusal Fairness (DECAF) algorithm achieves the best balance between privacy and fairness.
Applying pre-processing fairness algorithms to synthetic data improves fairness even more than when applied to real data.
arXiv Detail & Related papers (2025-01-03T12:35:58Z) - Second FRCSyn-onGoing: Winning Solutions and Post-Challenge Analysis to Improve Face Recognition with Synthetic Data [104.30479583607918]
2nd FRCSyn-onGoing challenge is based on the 2nd Face Recognition Challenge in the Era of Synthetic Data (FRCSyn), originally launched at CVPR 2024.
We focus on exploring the use of synthetic data both individually and in combination with real data to solve current challenges in face recognition.
arXiv Detail & Related papers (2024-12-02T11:12:01Z) - Best Practices and Lessons Learned on Synthetic Data [83.63271573197026]
The success of AI models relies on the availability of large, diverse, and high-quality datasets.
Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns.
arXiv Detail & Related papers (2024-04-11T06:34:17Z) - Strong statistical parity through fair synthetic data [0.0]
This paper explores the creation of synthetic data that embodies Fairness by Design.
A downstream model trained on such synthetic data provides fair predictions across all thresholds.
arXiv Detail & Related papers (2023-11-06T10:06:30Z) - The Use of Synthetic Data to Train AI Models: Opportunities and Risks
for Sustainable Development [0.6906005491572401]
This paper investigates the policies governing the creation, utilization, and dissemination of synthetic data.
A well crafted synthetic data policy must strike a balance between privacy concerns and the utility of data.
arXiv Detail & Related papers (2023-08-31T23:18:53Z) - Auditing and Generating Synthetic Data with Controllable Trust Trade-offs [54.262044436203965]
We introduce a holistic auditing framework that comprehensively evaluates synthetic datasets and AI models.
It focuses on preventing bias and discrimination, ensures fidelity to the source data, assesses utility, robustness, and privacy preservation.
We demonstrate the framework's effectiveness by auditing various generative models across diverse use cases.
arXiv Detail & Related papers (2023-04-21T09:03:18Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative
Networks [71.6879432974126]
We introduce DECAF: a GAN-based fair synthetic data generator for tabular data.
We show that DECAF successfully removes undesired bias and is capable of generating high-quality synthetic data.
We provide theoretical guarantees on the generator's convergence and the fairness of downstream models.
arXiv Detail & Related papers (2021-10-25T12:39:56Z) - Representative & Fair Synthetic Data [68.8204255655161]
We present a framework to incorporate fairness constraints into the self-supervised learning process.
We generate a representative as well as fair version of the UCI Adult census data set.
We consider representative & fair synthetic data a promising future building block to teach algorithms not on historic worlds, but rather on the worlds that we strive to live in.
arXiv Detail & Related papers (2021-04-07T09:19:46Z) - Measuring Utility and Privacy of Synthetic Genomic Data [3.635321290763711]
We provide the first evaluation of the utility and the privacy protection of five state-of-the-art models for generating synthetic genomic data.
Overall, there is no single approach for generating synthetic genomic data that performs well across the board.
arXiv Detail & Related papers (2021-02-05T17:41:01Z) - Fidelity and Privacy of Synthetic Medical Data [0.0]
The digitization of medical records ushered in a new era of big data to clinical science.
The need to share individual-level medical data continues to grow, and has never been more urgent.
enthusiasm for the use of big data has been tempered by a fully appropriate concern for patient autonomy and privacy.
arXiv Detail & Related papers (2021-01-18T23:01:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.