Assessing Fairness in the Presence of Missing Data
- URL: http://arxiv.org/abs/2112.04899v1
- Date: Tue, 7 Dec 2021 17:51:26 GMT
- Title: Assessing Fairness in the Presence of Missing Data
- Authors: Yiliang Zhang, Qi Long
- Abstract summary: We study the problem of estimating fairness in the complete data domain for an arbitrary model evaluated merely using complete cases.
Our work provides the first known theoretical results on fairness guarantee in analysis of incomplete data.
- Score: 2.3605348648054463
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Missing data are prevalent and present daunting challenges in real data
analysis. While there is a growing body of literature on fairness in analysis
of fully observed data, there has been little theoretical work on investigating
fairness in analysis of incomplete data. In practice, a popular analytical
approach for dealing with missing data is to use only the set of complete
cases, i.e., observations with all features fully observed to train a
prediction algorithm. However, depending on the missing data mechanism, the
distribution of complete cases and the distribution of the complete data may be
substantially different. When the goal is to develop a fair algorithm in the
complete data domain where there are no missing values, an algorithm that is
fair in the complete case domain may show disproportionate bias towards some
marginalized groups in the complete data domain. To fill this significant gap,
we study the problem of estimating fairness in the complete data domain for an
arbitrary model evaluated merely using complete cases. We provide upper and
lower bounds on the fairness estimation error and conduct numerical experiments
to assess our theoretical results. Our work provides the first known
theoretical results on fairness guarantee in analysis of incomplete data.
Related papers
- Targeted Learning for Data Fairness [52.59573714151884]
We expand fairness inference by evaluating fairness in the data generating process itself.
We derive estimators demographic parity, equal opportunity, and conditional mutual information.
To validate our approach, we perform several simulations and apply our estimators to real data.
arXiv Detail & Related papers (2025-02-06T18:51:28Z) - AIM: Attributing, Interpreting, Mitigating Data Unfairness [40.351282126410545]
Existing fair machine learning (FairML) research has predominantly focused on mitigating discriminative bias in the model prediction.
We investigate a novel research problem: discovering samples that reflect biases/prejudices from the training data.
We propose practical algorithms for measuring and countering sample bias.
arXiv Detail & Related papers (2024-06-13T05:21:10Z) - Lazy Data Practices Harm Fairness Research [49.02318458244464]
We present a comprehensive analysis of fair ML datasets, demonstrating how unreflective practices hinder the reach and reliability of algorithmic fairness findings.
Our analyses identify three main areas of concern: (1) a textbflack of representation for certain protected attributes in both data and evaluations; (2) the widespread textbf of minorities during data preprocessing; and (3) textbfopaque data processing threatening the generalization of fairness research.
This study underscores the need for a critical reevaluation of data practices in fair ML and offers directions to improve both the sourcing and usage of datasets.
arXiv Detail & Related papers (2024-04-26T09:51:24Z) - Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - Adapting Fairness Interventions to Missing Values [4.820576346277399]
Missing values in real-world data pose a significant and unique challenge to algorithmic fairness.
Standard procedure for handling missing values where first data is imputed, then the imputed data is used for classification can exacerbate discrimination.
We present scalable and adaptive algorithms for fair classification with missing values.
arXiv Detail & Related papers (2023-05-30T21:50:48Z) - Provable Detection of Propagating Sampling Bias in Prediction Models [1.7709344190822935]
We provide a theoretical analysis of how a specific form of data bias, differential sampling bias, propagates from the data stage to the prediction stage.
Under reasonable assumptions, we quantify how the amount of bias in the model predictions varies as a function of the amount of differential sampling bias in the data.
We demonstrate that the theoretical results hold in practice even when our assumptions are relaxed.
arXiv Detail & Related papers (2023-02-13T23:39:35Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Data-SUITE: Data-centric identification of in-distribution incongruous
examples [81.21462458089142]
Data-SUITE is a data-centric framework to identify incongruous regions of in-distribution (ID) data.
We empirically validate Data-SUITE's performance and coverage guarantees.
arXiv Detail & Related papers (2022-02-17T18:58:31Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z) - Full Law Identification In Graphical Models Of Missing Data:
Completeness Results [13.299431908881425]
We provide the first completeness result in this field of study.
We then address issues that may arise due to the presence of both missing data and unmeasured confounding.
arXiv Detail & Related papers (2020-04-10T01:31:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.