Imputation Strategies Under Clinical Presence: Impact on Algorithmic
Fairness
- URL: http://arxiv.org/abs/2208.06648v3
- Date: Fri, 30 Jun 2023 21:42:26 GMT
- Title: Imputation Strategies Under Clinical Presence: Impact on Algorithmic
Fairness
- Authors: Vincent Jeanselme, Maria De-Arteaga, Zhe Zhang, Jessica Barrett and
Brian Tom
- Abstract summary: biases have marked medical history, leading to unequal care affecting marginalised groups.
Our work studies how imputation choices affect reconstruction errors across groups and algorithmic fairness properties of downstream predictions.
- Score: 6.218860613388414
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning risks reinforcing biases present in data, and, as we argue
in this work, in what is absent from data. In healthcare, biases have marked
medical history, leading to unequal care affecting marginalised groups.
Patterns in missing data often reflect these group discrepancies, but the
algorithmic fairness implications of group-specific missingness are not well
understood. Despite its potential impact, imputation is often an overlooked
preprocessing step, with attention placed on the reduction of reconstruction
error and overall performance, ignoring how imputation can affect groups
differently. Our work studies how imputation choices affect reconstruction
errors across groups and algorithmic fairness properties of downstream
predictions.
Related papers
- DispaRisk: Auditing Fairness Through Usable Information [21.521208250966918]
DispaRisk is a framework designed to assess the potential risks of disparities in datasets during the initial stages of machine learning pipeline.
DispaRisk identifies datasets with a high risk of discrimination, detect model families prone to biases within an ML pipeline, and enhance the explainability of these bias risks.
This work contributes to the development of fairer ML systems by providing a robust tool for early bias detection and mitigation.
arXiv Detail & Related papers (2024-05-20T20:56:01Z) - The Impact of Differential Feature Under-reporting on Algorithmic Fairness [86.275300739926]
We present an analytically tractable model of differential feature under-reporting.
We then use to characterize the impact of this kind of data bias on algorithmic fairness.
Our results show that, in real world data settings, under-reporting typically leads to increasing disparities.
arXiv Detail & Related papers (2024-01-16T19:16:22Z) - On The Impact of Machine Learning Randomness on Group Fairness [11.747264308336012]
We investigate the impact on group fairness of different sources of randomness in training neural networks.
We show that the variance in group fairness measures is rooted in the high volatility of the learning process on under-represented groups.
We show how one can control group-level accuracy, with high efficiency and negligible impact on the model's overall performance, by simply changing the data order for a single epoch.
arXiv Detail & Related papers (2023-07-09T09:36:31Z) - To Impute or not to Impute? -- Missing Data in Treatment Effect
Estimation [84.76186111434818]
We identify a new missingness mechanism, which we term mixed confounded missingness (MCM), where some missingness determines treatment selection and other missingness is determined by treatment selection.
We show that naively imputing all data leads to poor performing treatment effects models, as the act of imputation effectively removes information necessary to provide unbiased estimates.
Our solution is selective imputation, where we use insights from MCM to inform precisely which variables should be imputed and which should not.
arXiv Detail & Related papers (2022-02-04T12:08:31Z) - SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event
Data [83.50281440043241]
We study the problem of inferring heterogeneous treatment effects from time-to-event data.
We propose a novel deep learning method for treatment-specific hazard estimation based on balancing representations.
arXiv Detail & Related papers (2021-10-26T20:13:17Z) - TRAPDOOR: Repurposing backdoors to detect dataset bias in machine
learning-based genomic analysis [15.483078145498085]
Under-representation of groups in datasets can lead to inaccurate predictions for certain groups, which can exacerbate systemic discrimination issues.
We propose TRAPDOOR, a methodology for identification of biased datasets by repurposing a technique that has been mostly proposed for nefarious purposes: Neural network backdoors.
Using a real-world cancer dataset, we analyze the dataset with the bias that already existed towards white individuals and also introduced biases in datasets artificially.
arXiv Detail & Related papers (2021-08-14T17:02:02Z) - Doing Great at Estimating CATE? On the Neglected Assumptions in
Benchmark Comparisons of Treatment Effect Estimators [91.3755431537592]
We show that even in arguably the simplest setting, estimation under ignorability assumptions can be misleading.
We consider two popular machine learning benchmark datasets for evaluation of heterogeneous treatment effect estimators.
We highlight that the inherent characteristics of the benchmark datasets favor some algorithms over others.
arXiv Detail & Related papers (2021-07-28T13:21:27Z) - Fair Classification with Group-Dependent Label Noise [6.324366770332667]
This work examines how to train fair classifiers in settings where training labels are corrupted with random noise.
We show that naively imposing parity constraints on demographic disparity measures, without accounting for heterogeneous and group-dependent error rates, can decrease both the accuracy and the fairness of the resulting classifier.
arXiv Detail & Related papers (2020-10-31T22:35:01Z) - Targeted VAE: Variational and Targeted Learning for Causal Inference [39.351088248776435]
Undertaking causal inference with observational data is incredibly useful across a wide range of tasks.
There are two significant challenges associated with undertaking causal inference using observational data.
We address these two challenges by combining structured inference and targeted learning.
arXiv Detail & Related papers (2020-09-28T16:55:24Z) - Generalization Bounds and Representation Learning for Estimation of
Potential Outcomes and Causal Effects [61.03579766573421]
We study estimation of individual-level causal effects, such as a single patient's response to alternative medication.
We devise representation learning algorithms that minimize our bound, by regularizing the representation's induced treatment group distance.
We extend these algorithms to simultaneously learn a weighted representation to further reduce treatment group distances.
arXiv Detail & Related papers (2020-01-21T10:16:33Z) - Learning Overlapping Representations for the Estimation of
Individualized Treatment Effects [97.42686600929211]
Estimating the likely outcome of alternatives from observational data is a challenging problem.
We show that algorithms that learn domain-invariant representations of inputs are often inappropriate.
We develop a deep kernel regression algorithm and posterior regularization framework that substantially outperforms the state-of-the-art on a variety of benchmarks data sets.
arXiv Detail & Related papers (2020-01-14T12:56:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.