Fairness in Missing Data Imputation
- URL: http://arxiv.org/abs/2110.12002v1
- Date: Fri, 22 Oct 2021 18:29:17 GMT
- Title: Fairness in Missing Data Imputation
- Authors: Yiliang Zhang, Qi Long
- Abstract summary: We conduct the first known research on fairness of missing data imputation.
By studying the performance of imputation methods in three commonly used datasets, we demonstrate that unfairness of missing value imputation widely exists.
Our results suggest that, in practice, a careful investigation of related factors can provide valuable insights on mitigating unfairness associated with missing data imputation.
- Score: 2.3605348648054463
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Missing data are ubiquitous in the era of big data and, if inadequately
handled, are known to lead to biased findings and have deleterious impact on
data-driven decision makings. To mitigate its impact, many missing value
imputation methods have been developed. However, the fairness of these
imputation methods across sensitive groups has not been studied. In this paper,
we conduct the first known research on fairness of missing data imputation. By
studying the performance of imputation methods in three commonly used datasets,
we demonstrate that unfairness of missing value imputation widely exists and
may be associated with multiple factors. Our results suggest that, in practice,
a careful investigation of related factors can provide valuable insights on
mitigating unfairness associated with missing data imputation.
Related papers
- The influence of missing data mechanisms and simple missing data handling techniques on fairness [0.0]
We study how missing values and the handling thereof can impact the fairness of an algorithm.
The starting point of the study is the mechanism of missingness, leading into how the missing data are processed.
The results show that under certain scenarios the impact on fairness can be pronounced when the missingness mechanism is missing at random.
arXiv Detail & Related papers (2025-03-10T13:32:25Z) - Is it Still Fair? A Comparative Evaluation of Fairness Algorithms through the Lens of Covariate Drift [17.498879317113385]
We study data distributional drift and its impact on fairness algorithms and metrics.
In several cases, data distributional drift can lead to serious deterioration of fairness in so-called fair models.
Emanating from our findings, we synthesize several policy implications of data distributional drift on fairness algorithms.
arXiv Detail & Related papers (2024-09-19T03:18:12Z) - The Impact of Differential Feature Under-reporting on Algorithmic Fairness [86.275300739926]
We present an analytically tractable model of differential feature under-reporting.
We then use to characterize the impact of this kind of data bias on algorithmic fairness.
Our results show that, in real world data settings, under-reporting typically leads to increasing disparities.
arXiv Detail & Related papers (2024-01-16T19:16:22Z) - Too Good To Be True: performance overestimation in (re)current practices
for Human Activity Recognition [49.1574468325115]
sliding windows for data segmentation followed by standard random k-fold cross validation produce biased results.
It is important to raise awareness in the scientific community about this problem, whose negative effects are being overlooked.
Several experiments with different types of datasets and different types of classification models allow us to exhibit the problem and show it persists independently of the method or dataset.
arXiv Detail & Related papers (2023-10-18T13:24:05Z) - Impact Of Missing Data Imputation On The Fairness And Accuracy Of Graph
Node Classifiers [0.19573380763700707]
We analyze the effect on fairness in the context of graph data (node attributes) imputation using different embedding and neural network methods.
Our results provide valuable insights into graph data fairness and how to handle missingness in graphs efficiently.
arXiv Detail & Related papers (2022-11-01T23:16:36Z) - Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem.
We examine the performance of various debiasing methods across multiple tasks.
We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z) - Multistage Large Segment Imputation Framework Based on Deep Learning and
Statistic Metrics [8.266097781813656]
This study proposes a multistage imputation framework based on deep learning with adaptability for missing value imputation.
The model presents a mixture measurement index of low- and higher-order statistics for data distribution and a new perspective on data imputation performance metrics.
Experimental results show that the multistage imputation strategy and the mixture index are superior and that the effect of missing value imputation has been improved to some extent.
arXiv Detail & Related papers (2022-09-22T14:17:24Z) - Valid Inference After Causal Discovery [73.87055989355737]
We develop tools for valid post-causal-discovery inference.
We show that a naive combination of causal discovery and subsequent inference algorithms leads to highly inflated miscoverage rates.
arXiv Detail & Related papers (2022-08-11T17:40:45Z) - To Impute or not to Impute? -- Missing Data in Treatment Effect
Estimation [84.76186111434818]
We identify a new missingness mechanism, which we term mixed confounded missingness (MCM), where some missingness determines treatment selection and other missingness is determined by treatment selection.
We show that naively imputing all data leads to poor performing treatment effects models, as the act of imputation effectively removes information necessary to provide unbiased estimates.
Our solution is selective imputation, where we use insights from MCM to inform precisely which variables should be imputed and which should not.
arXiv Detail & Related papers (2022-02-04T12:08:31Z) - LARD: Large-scale Artificial Disfluency Generation [0.0]
We propose LARD, a method for generating complex and realistic artificial disfluencies with little effort.
The proposed method can handle three of the most common types of disfluencies: repetitions, replacements and restarts.
We release a new large-scale dataset with disfluencies that can be used on four different tasks.
arXiv Detail & Related papers (2022-01-13T16:02:36Z) - Predicting feature imputability in the absence of ground truth [2.7684432804249477]
It is difficult to evaluate whether data has been imputed accurately (lack of ground truth) in real life applications.
This paper proposes an effective and simple principal component based method for determining whether individual data features can be accurately imputed.
arXiv Detail & Related papers (2020-07-14T14:24:07Z) - Enabling Counterfactual Survival Analysis with Balanced Representations [64.17342727357618]
Survival data are frequently encountered across diverse medical applications, i.e., drug development, risk profiling, and clinical trials.
We propose a theoretically grounded unified framework for counterfactual inference applicable to survival outcomes.
arXiv Detail & Related papers (2020-06-14T01:15:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.