Related papers: Fairness Implications of Encoding Protected Categorical Attributes

Fairness Implications of Encoding Protected Categorical Attributes

URL: http://arxiv.org/abs/2201.11358v2
Date: Fri, 5 May 2023 22:03:12 GMT
Title: Fairness Implications of Encoding Protected Categorical Attributes
Authors: Carlos Mougan, Jose M. Alvarez, Salvatore Ruggieri, Steffen Staab
Abstract summary: We compare the accuracy and fairness implications of two well-known encoding methods: emphone-hot encoding and emphtarget encoding. First type, textitirreducible bias, is due to direct group category discrimination, and the second type, textitreducible bias, is due to the large variance in statistically underrepresented groups. We consider the problem of intersectional unfairness that may arise when machine learning best practices improve performance measures by encoding several categorical attributes into a high-cardinality feature.
Score: 26.7015058286397
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Past research has demonstrated that the explicit use of protected attributes in machine learning can improve both performance and fairness. Many machine learning algorithms, however, cannot directly process categorical attributes, such as country of birth or ethnicity. Because protected attributes frequently are categorical, they must be encoded as features that can be input to a chosen machine learning algorithm, e.g.\ support vector machines, gradient boosting decision trees or linear models. Thereby, encoding methods influence how and what the machine learning algorithm will learn, affecting model performance and fairness. This work compares the accuracy and fairness implications of the two most well-known encoding methods: \emph{one-hot encoding} and \emph{target encoding}. We distinguish between two types of induced bias that may arise from these encoding methods and may lead to unfair models. The first type, \textit{irreducible bias}, is due to direct group category discrimination, and the second type, \textit{reducible bias}, is due to the large variance in statistically underrepresented groups. We investigate the interaction between categorical encodings and target encoding regularization methods that reduce unfairness. Furthermore, we consider the problem of intersectional unfairness that may arise when machine learning best practices improve performance measures by encoding several categorical attributes into a high-cardinality feature.

Related papers

Classes Are Not Equal: An Empirical Study on Image Recognition Fairness [100.36114135663836]
We experimentally demonstrate that classes are not equal and the fairness issue is prevalent for image classification models across various datasets. Our findings reveal that models tend to exhibit greater prediction biases for classes that are more challenging to recognize. Data augmentation and representation learning algorithms improve overall performance by promoting fairness to some degree in image classification.
arXiv Detail & Related papers (2024-02-28T07:54:50Z)
Bayes-Optimal Fair Classification with Linear Disparity Constraints via Pre-, In-, and Post-processing [32.5214395114507]
We develop methods for Bayes-optimal fair classification, aiming to minimize classification error subject to given group fairness constraints. We show that several popular disparity measures -- the deviations from demographic parity, equality of opportunity, and predictive equality -- are bilinear. Our methods control disparity directly while achieving near-optimal fairness-accuracy tradeoffs.
arXiv Detail & Related papers (2024-02-05T08:59:47Z)
Practical Approaches for Fair Learning with Multitype and Multivariate Sensitive Attributes [70.6326967720747]
It is important to guarantee that machine learning algorithms deployed in the real world do not result in unfairness or unintended social consequences. We introduce FairCOCCO, a fairness measure built on cross-covariance operators on reproducing kernel Hilbert Spaces. We empirically demonstrate consistent improvements against state-of-the-art techniques in balancing predictive power and fairness on real-world datasets.
arXiv Detail & Related papers (2022-11-11T11:28:46Z)
Learning Disentangled Textual Representations via Statistical Measures of Similarity [35.74568888409149]
We introduce a family of regularizers for learning disentangled representations that do not require training. Our novel regularizers do not require additional training, are faster and do not involve additional tuning.
arXiv Detail & Related papers (2022-05-07T08:06:22Z)
Semi-FairVAE: Semi-supervised Fair Representation Learning with Adversarial Variational Autoencoder [92.67156911466397]
We propose a semi-supervised fair representation learning approach based on adversarial variational autoencoder. We use a bias-aware model to capture inherent bias information on sensitive attribute. We also use a bias-free model to learn debiased fair representations by using adversarial learning to remove bias information from them.
arXiv Detail & Related papers (2022-04-01T15:57:47Z)
Fair Tree Learning [0.15229257192293202]
Various optimisation criteria combine classification performance with a fairness metric. Current fair decision tree methods only optimise for a fixed threshold on both the classification task as well as the fairness metric. We propose a threshold-independent fairness metric termed uniform demographic parity, and a derived splitting criterion entitled SCAFF -- Splitting Criterion AUC for Fairness.
arXiv Detail & Related papers (2021-10-18T13:40:25Z)
Contrastive Learning for Fair Representations [50.95604482330149]
Trained classification models can unintentionally lead to biased representations and predictions. Existing debiasing methods for classification models, such as adversarial training, are often expensive to train and difficult to optimise. We propose a method for mitigating bias by incorporating contrastive learning, in which instances sharing the same class label are encouraged to have similar representations.
arXiv Detail & Related papers (2021-09-22T10:47:51Z)
Theoretical Insights Into Multiclass Classification: A High-dimensional Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification. Our analysis reveals that the classification accuracy is highly distribution-dependent. The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z)
Learning and Evaluating Representations for Deep One-class Classification [59.095144932794646]
We present a two-stage framework for deep one-class classification. We first learn self-supervised representations from one-class data, and then build one-class classifiers on learned representations. In experiments, we demonstrate state-of-the-art performance on visual domain one-class classification benchmarks.
arXiv Detail & Related papers (2020-11-04T23:33:41Z)
One-vs.-One Mitigation of Intersectional Bias: A General Method to Extend Fairness-Aware Binary Classification [0.48733623015338234]
One-vs.-One Mitigation is a process of comparison between each pair of subgroups related to sensitive attributes to the fairness-aware machine learning for binary classification. Our method mitigates the intersectional bias much better than conventional methods in all the settings.
arXiv Detail & Related papers (2020-10-26T11:35:39Z)
Metrics and methods for a systematic comparison of fairness-aware machine learning algorithms [0.0]
This study is the most comprehensive of its kind. It considers fairness, predictive-performance, calibration quality, and speed of 28 different modelling pipelines. We also found that fairness-aware algorithms can induce fairness without material drops in predictive power.
arXiv Detail & Related papers (2020-10-08T13:58:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.