Information Theoretic Measures for Fairness-aware Feature Selection
- URL: http://arxiv.org/abs/2106.00772v1
- Date: Tue, 1 Jun 2021 20:11:54 GMT
- Title: Information Theoretic Measures for Fairness-aware Feature Selection
- Authors: Sajad Khodadadian, Mohamed Nafea, AmirEmad Ghassami, Negar Kiyavash
- Abstract summary: We develop a framework for fairness-aware feature selection, based on information theoretic measures for the accuracy and discriminatory impacts of features.
Specifically, our goal is to design a fairness utility score for each feature which quantifies how this feature influences accurate as well as nondiscriminatory decisions.
- Score: 27.06618125828978
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine earning algorithms are increasingly used for consequential decision
making regarding individuals based on their relevant features. Features that
are relevant for accurate decisions may however lead to either explicit or
implicit forms of discrimination against unprivileged groups, such as those of
certain race or gender. This happens due to existing biases in the training
data, which are often replicated or even exacerbated by the learning algorithm.
Identifying and measuring these biases at the data level is a challenging
problem due to the interdependence among the features, and the decision
outcome. In this work, we develop a framework for fairness-aware feature
selection, based on information theoretic measures for the accuracy and
discriminatory impacts of features. Specifically, our goal is to design a
fairness utility score for each feature which quantifies how this feature
influences accurate as well as nondiscriminatory decisions. We first propose
information theoretic measures for the impact of different subsets of features
on the accuracy and discrimination of the model. Subsequently, we deduce the
marginal impact of each feature using Shapley value function. Our framework
depends on the joint statistics of the data rather than a particular classifier
design. We examine our proposed framework on real and synthetic data to
evaluate its performance.
Related papers
- Detecting and Identifying Selection Structure in Sequential Data [53.24493902162797]
We argue that the selective inclusion of data points based on latent objectives is common in practical situations, such as music sequences.
We show that selection structure is identifiable without any parametric assumptions or interventional experiments.
We also propose a provably correct algorithm to detect and identify selection structures as well as other types of dependencies.
arXiv Detail & Related papers (2024-06-29T20:56:34Z) - Detection and Evaluation of bias-inducing Features in Machine learning [14.045499740240823]
In the context of machine learning (ML), one can use cause-to-effect analysis to understand the reason for the biased behavior of the system.
We propose an approach for systematically identifying all bias-inducing features of a model to help support the decision-making of domain experts.
arXiv Detail & Related papers (2023-10-19T15:01:16Z) - Evaluating the Fairness of Discriminative Foundation Models in Computer
Vision [51.176061115977774]
We propose a novel taxonomy for bias evaluation of discriminative foundation models, such as Contrastive Language-Pretraining (CLIP)
We then systematically evaluate existing methods for mitigating bias in these models with respect to our taxonomy.
Specifically, we evaluate OpenAI's CLIP and OpenCLIP models for key applications, such as zero-shot classification, image retrieval and image captioning.
arXiv Detail & Related papers (2023-10-18T10:32:39Z) - A statistical approach to detect sensitive features in a group fairness
setting [10.087372021356751]
We propose a preprocessing step to address the task of automatically recognizing sensitive features that does not require a trained model to verify unfair results.
Our empirical results attest our hypothesis and show that several features considered as sensitive in the literature do not necessarily entail disparate (unfair) results.
arXiv Detail & Related papers (2023-05-11T17:30:12Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - Measuring Fairness Under Unawareness of Sensitive Attributes: A
Quantification-Based Approach [131.20444904674494]
We tackle the problem of measuring group fairness under unawareness of sensitive attributes.
We show that quantification approaches are particularly suited to tackle the fairness-under-unawareness problem.
arXiv Detail & Related papers (2021-09-17T13:45:46Z) - Through the Data Management Lens: Experimental Analysis and Evaluation
of Fair Classification [75.49600684537117]
Data management research is showing an increasing presence and interest in topics related to data and algorithmic fairness.
We contribute a broad analysis of 13 fair classification approaches and additional variants, over their correctness, fairness, efficiency, scalability, and stability.
Our analysis highlights novel insights on the impact of different metrics and high-level approach characteristics on different aspects of performance.
arXiv Detail & Related papers (2021-01-18T22:55:40Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z) - Nonparametric Feature Impact and Importance [0.6123324869194193]
We give mathematical definitions of feature impact and importance, derived from partial dependence curves, that operate directly on the data.
To assess quality, we show that features ranked by these definitions are competitive with existing feature selection techniques.
arXiv Detail & Related papers (2020-06-08T17:07:35Z) - Fairness-Aware Learning with Prejudice Free Representations [2.398608007786179]
We propose a novel algorithm that can effectively identify and treat latent discriminating features.
The approach helps to collect discrimination-free features that would improve the model performance.
arXiv Detail & Related papers (2020-02-26T10:06:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.