Fairness Implications of Encoding Protected Categorical Attributes
- URL: http://arxiv.org/abs/2201.11358v2
- Date: Fri, 5 May 2023 22:03:12 GMT
- Title: Fairness Implications of Encoding Protected Categorical Attributes
- Authors: Carlos Mougan, Jose M. Alvarez, Salvatore Ruggieri, Steffen Staab
- Abstract summary: We compare the accuracy and fairness implications of two well-known encoding methods: emphone-hot encoding and emphtarget encoding.
First type, textitirreducible bias, is due to direct group category discrimination, and the second type, textitreducible bias, is due to the large variance in statistically underrepresented groups.
We consider the problem of intersectional unfairness that may arise when machine learning best practices improve performance measures by encoding several categorical attributes into a high-cardinality feature.
- Score: 26.7015058286397
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Past research has demonstrated that the explicit use of protected attributes
in machine learning can improve both performance and fairness. Many machine
learning algorithms, however, cannot directly process categorical attributes,
such as country of birth or ethnicity. Because protected attributes frequently
are categorical, they must be encoded as features that can be input to a chosen
machine learning algorithm, e.g.\ support vector machines, gradient boosting
decision trees or linear models. Thereby, encoding methods influence how and
what the machine learning algorithm will learn, affecting model performance and
fairness. This work compares the accuracy and fairness implications of the two
most well-known encoding methods: \emph{one-hot encoding} and \emph{target
encoding}. We distinguish between two types of induced bias that may arise from
these encoding methods and may lead to unfair models. The first type,
\textit{irreducible bias}, is due to direct group category discrimination, and
the second type, \textit{reducible bias}, is due to the large variance in
statistically underrepresented groups. We investigate the interaction between
categorical encodings and target encoding regularization methods that reduce
unfairness. Furthermore, we consider the problem of intersectional unfairness
that may arise when machine learning best practices improve performance
measures by encoding several categorical attributes into a high-cardinality
feature.
Related papers
- Classes Are Not Equal: An Empirical Study on Image Recognition Fairness [100.36114135663836]
We experimentally demonstrate that classes are not equal and the fairness issue is prevalent for image classification models across various datasets.
Our findings reveal that models tend to exhibit greater prediction biases for classes that are more challenging to recognize.
Data augmentation and representation learning algorithms improve overall performance by promoting fairness to some degree in image classification.
arXiv Detail & Related papers (2024-02-28T07:54:50Z) - Bayes-Optimal Fair Classification with Linear Disparity Constraints via
Pre-, In-, and Post-processing [32.5214395114507]
We develop methods for Bayes-optimal fair classification, aiming to minimize classification error subject to given group fairness constraints.
We show that several popular disparity measures -- the deviations from demographic parity, equality of opportunity, and predictive equality -- are bilinear.
Our methods control disparity directly while achieving near-optimal fairness-accuracy tradeoffs.
arXiv Detail & Related papers (2024-02-05T08:59:47Z) - Practical Approaches for Fair Learning with Multitype and Multivariate
Sensitive Attributes [70.6326967720747]
It is important to guarantee that machine learning algorithms deployed in the real world do not result in unfairness or unintended social consequences.
We introduce FairCOCCO, a fairness measure built on cross-covariance operators on reproducing kernel Hilbert Spaces.
We empirically demonstrate consistent improvements against state-of-the-art techniques in balancing predictive power and fairness on real-world datasets.
arXiv Detail & Related papers (2022-11-11T11:28:46Z) - Learning Disentangled Textual Representations via Statistical Measures
of Similarity [35.74568888409149]
We introduce a family of regularizers for learning disentangled representations that do not require training.
Our novel regularizers do not require additional training, are faster and do not involve additional tuning.
arXiv Detail & Related papers (2022-05-07T08:06:22Z) - Semi-FairVAE: Semi-supervised Fair Representation Learning with
Adversarial Variational Autoencoder [92.67156911466397]
We propose a semi-supervised fair representation learning approach based on adversarial variational autoencoder.
We use a bias-aware model to capture inherent bias information on sensitive attribute.
We also use a bias-free model to learn debiased fair representations by using adversarial learning to remove bias information from them.
arXiv Detail & Related papers (2022-04-01T15:57:47Z) - Fair Tree Learning [0.15229257192293202]
Various optimisation criteria combine classification performance with a fairness metric.
Current fair decision tree methods only optimise for a fixed threshold on both the classification task as well as the fairness metric.
We propose a threshold-independent fairness metric termed uniform demographic parity, and a derived splitting criterion entitled SCAFF -- Splitting Criterion AUC for Fairness.
arXiv Detail & Related papers (2021-10-18T13:40:25Z) - Contrastive Learning for Fair Representations [50.95604482330149]
Trained classification models can unintentionally lead to biased representations and predictions.
Existing debiasing methods for classification models, such as adversarial training, are often expensive to train and difficult to optimise.
We propose a method for mitigating bias by incorporating contrastive learning, in which instances sharing the same class label are encouraged to have similar representations.
arXiv Detail & Related papers (2021-09-22T10:47:51Z) - Theoretical Insights Into Multiclass Classification: A High-dimensional
Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification.
Our analysis reveals that the classification accuracy is highly distribution-dependent.
The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z) - Learning and Evaluating Representations for Deep One-class
Classification [59.095144932794646]
We present a two-stage framework for deep one-class classification.
We first learn self-supervised representations from one-class data, and then build one-class classifiers on learned representations.
In experiments, we demonstrate state-of-the-art performance on visual domain one-class classification benchmarks.
arXiv Detail & Related papers (2020-11-04T23:33:41Z) - One-vs.-One Mitigation of Intersectional Bias: A General Method to
Extend Fairness-Aware Binary Classification [0.48733623015338234]
One-vs.-One Mitigation is a process of comparison between each pair of subgroups related to sensitive attributes to the fairness-aware machine learning for binary classification.
Our method mitigates the intersectional bias much better than conventional methods in all the settings.
arXiv Detail & Related papers (2020-10-26T11:35:39Z) - Metrics and methods for a systematic comparison of fairness-aware
machine learning algorithms [0.0]
This study is the most comprehensive of its kind.
It considers fairness, predictive-performance, calibration quality, and speed of 28 different modelling pipelines.
We also found that fairness-aware algorithms can induce fairness without material drops in predictive power.
arXiv Detail & Related papers (2020-10-08T13:58:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.