Through the Data Management Lens: Experimental Analysis and Evaluation
of Fair Classification
- URL: http://arxiv.org/abs/2101.07361v1
- Date: Mon, 18 Jan 2021 22:55:40 GMT
- Title: Through the Data Management Lens: Experimental Analysis and Evaluation
of Fair Classification
- Authors: Maliha Tashfia Islam, Anna Fariha, Alexandra Meliou
- Abstract summary: Data management research is showing an increasing presence and interest in topics related to data and algorithmic fairness.
We contribute a broad analysis of 13 fair classification approaches and additional variants, over their correctness, fairness, efficiency, scalability, and stability.
Our analysis highlights novel insights on the impact of different metrics and high-level approach characteristics on different aspects of performance.
- Score: 75.49600684537117
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classification, a heavily-studied data-driven machine learning task, drives
an increasing number of prediction systems involving critical human decisions
such as loan approval and criminal risk assessment. However, classifiers often
demonstrate discriminatory behavior, especially when presented with biased
data. Consequently, fairness in classification has emerged as a high-priority
research area. Data management research is showing an increasing presence and
interest in topics related to data and algorithmic fairness, including the
topic of fair classification. The interdisciplinary efforts in fair
classification, with machine learning research having the largest presence,
have resulted in a large number of fairness notions and a wide range of
approaches that have not been systematically evaluated and compared. In this
paper, we contribute a broad analysis of 13 fair classification approaches and
additional variants, over their correctness, fairness, efficiency, scalability,
and stability, using a variety of metrics and real-world datasets. Our analysis
highlights novel insights on the impact of different metrics and high-level
approach characteristics on different aspects of performance. We also discuss
general principles for choosing approaches suitable for different practical
settings, and identify areas where data-management-centric solutions are likely
to have the most impact.
Related papers
- Bridging the Gap: Protocol Towards Fair and Consistent Affect Analysis [24.737468736951374]
The increasing integration of machine learning algorithms in daily life underscores the critical need for fairness and equity in their deployment.
Existing databases and methodologies lack uniformity, leading to biased evaluations.
This work addresses these issues by analyzing six affective databases, annotating demographic attributes, and proposing a common protocol for database partitioning.
arXiv Detail & Related papers (2024-05-10T22:40:01Z) - A Large-Scale Empirical Study on Improving the Fairness of Image Classification Models [22.522156479335706]
This paper conducts the first large-scale empirical study to compare the performance of existing state-of-the-art fairness improving techniques.
Our findings reveal substantial variations in the performance of each method across different datasets and sensitive attributes.
Different fairness evaluation metrics, due to their distinct focuses, yield significantly different assessment results.
arXiv Detail & Related papers (2024-01-08T06:53:33Z) - Fairness meets Cross-Domain Learning: a new perspective on Models and
Metrics [80.07271410743806]
We study the relationship between cross-domain learning (CD) and model fairness.
We introduce a benchmark on face and medical images spanning several demographic groups as well as classification and localization tasks.
Our study covers 14 CD approaches alongside three state-of-the-art fairness algorithms and shows how the former can outperform the latter.
arXiv Detail & Related papers (2023-03-25T09:34:05Z) - Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem.
We examine the performance of various debiasing methods across multiple tasks.
We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - A survey on datasets for fairness-aware machine learning [6.962333053044713]
A large variety of fairness-aware machine learning solutions have been proposed.
In this paper, we overview real-world datasets used for fairness-aware machine learning.
For a deeper understanding of bias and fairness in the datasets, we investigate the interesting relationships using exploratory analysis.
arXiv Detail & Related papers (2021-10-01T16:54:04Z) - MultiFair: Multi-Group Fairness in Machine Learning [52.24956510371455]
We study multi-group fairness in machine learning (MultiFair)
We propose a generic end-to-end algorithmic framework to solve it.
Our proposed framework is generalizable to many different settings.
arXiv Detail & Related papers (2021-05-24T02:30:22Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.