Related papers: The influence of missing data mechanisms and simple missing data handling techniques on fairness

The influence of missing data mechanisms and simple missing data handling techniques on fairness

URL: http://arxiv.org/abs/2503.07313v1
Date: Mon, 10 Mar 2025 13:32:25 GMT
Title: The influence of missing data mechanisms and simple missing data handling techniques on fairness
Authors: Aeysha Bhatti, Trudie Sandrock, Johane Nienkemper-Swanepoel,
Abstract summary: We study how missing values and the handling thereof can impact the fairness of an algorithm.<n>The starting point of the study is the mechanism of missingness, leading into how the missing data are processed.<n>The results show that under certain scenarios the impact on fairness can be pronounced when the missingness mechanism is missing at random.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fairness of machine learning algorithms is receiving increasing attention, as such algorithms permeate the day-to-day aspects of our lives. One way in which bias can manifest in a dataset is through missing values. If data are missing, these data are often assumed to be missing completely randomly; in reality the propensity of data being missing is often tied to the demographic characteristics of individuals. There is limited research into how missing values and the handling thereof can impact the fairness of an algorithm. Most researchers either apply listwise deletion or tend to use the simpler methods of imputation (e.g. mean or mode) compared to the more advanced ones (e.g. multiple imputation); we therefore study the impact of the simpler methods on the fairness of algorithms. The starting point of the study is the mechanism of missingness, leading into how the missing data are processed and finally how this impacts fairness. Three popular datasets in the field of fairness are amputed in a simulation study. The results show that under certain scenarios the impact on fairness can be pronounced when the missingness mechanism is missing at random. Furthermore, elementary missing data handling techniques like listwise deletion and mode imputation can lead to higher fairness compared to more complex imputation methods like k-nearest neighbour imputation, albeit often at the cost of lower accuracy.

Related papers

Adapting Fairness Interventions to Missing Values [4.820576346277399]
Missing values in real-world data pose a significant and unique challenge to algorithmic fairness. Standard procedure for handling missing values where first data is imputed, then the imputed data is used for classification can exacerbate discrimination. We present scalable and adaptive algorithms for fair classification with missing values.
arXiv Detail & Related papers (2023-05-30T21:50:48Z)
Learning to Bound Counterfactual Inference in Structural Causal Models from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm. The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources. It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z)
Impact Of Missing Data Imputation On The Fairness And Accuracy Of Graph Node Classifiers [0.19573380763700707]
We analyze the effect on fairness in the context of graph data (node attributes) imputation using different embedding and neural network methods. Our results provide valuable insights into graph data fairness and how to handle missingness in graphs efficiently.
arXiv Detail & Related papers (2022-11-01T23:16:36Z)
MissDAG: Causal Discovery in the Presence of Missing Data with Continuous Additive Noise Models [78.72682320019737]
We develop a general method, which we call MissDAG, to perform causal discovery from data with incomplete observations. MissDAG maximizes the expected likelihood of the visible part of observations under the expectation-maximization framework. We demonstrate the flexibility of MissDAG for incorporating various causal discovery algorithms and its efficacy through extensive simulations and real data experiments.
arXiv Detail & Related papers (2022-05-27T09:59:46Z)
MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms [82.90843777097606]
We propose a causally-aware imputation algorithm (MIRACLE) for missing data. MIRACLE iteratively refines the imputation of a baseline by simultaneously modeling the missingness generating mechanism. We conduct extensive experiments on synthetic and a variety of publicly available datasets to show that MIRACLE is able to consistently improve imputation.
arXiv Detail & Related papers (2021-11-04T22:38:18Z)
Fairness in Missing Data Imputation [2.3605348648054463]
We conduct the first known research on fairness of missing data imputation. By studying the performance of imputation methods in three commonly used datasets, we demonstrate that unfairness of missing value imputation widely exists. Our results suggest that, in practice, a careful investigation of related factors can provide valuable insights on mitigating unfairness associated with missing data imputation.
arXiv Detail & Related papers (2021-10-22T18:29:17Z)
Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values [4.973456986972679]
We investigate the fairness concerns of training a machine learning model using data with missing values. We propose an integrated approach based on decision trees that does not require a separate process of imputation and learning. We demonstrate that our approach outperforms existing fairness intervention methods applied to an imputed dataset.
arXiv Detail & Related papers (2021-09-21T20:46:22Z)
Doing Great at Estimating CATE? On the Neglected Assumptions in Benchmark Comparisons of Treatment Effect Estimators [91.3755431537592]
We show that even in arguably the simplest setting, estimation under ignorability assumptions can be misleading. We consider two popular machine learning benchmark datasets for evaluation of heterogeneous treatment effect estimators. We highlight that the inherent characteristics of the benchmark datasets favor some algorithms over others.
arXiv Detail & Related papers (2021-07-28T13:21:27Z)
Greedy structure learning from data that contains systematic missing values [13.088541054366527]
Learning from data that contain missing values represents a common phenomenon in many domains. Relatively few Bayesian Network structure learning algorithms account for missing data. This paper describes three variants of greedy search structure learning that utilise pairwise deletion and inverse probability weighting.
arXiv Detail & Related papers (2021-07-09T02:56:44Z)
Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation. We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation. Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z)
Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning. We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class. We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.