Identifying Biased Subgroups in Ranking and Classification
- URL: http://arxiv.org/abs/2108.07450v1
- Date: Tue, 17 Aug 2021 05:26:11 GMT
- Title: Identifying Biased Subgroups in Ranking and Classification
- Authors: Eliana Pastor, Luca de Alfaro, Elena Baralis
- Abstract summary: We introduce the notion of divergence to measure performance difference.
We exploit it in the context of (i) classification models and (ii) ranking applications.
We quantify the contribution of all attributes in the data subgroup to the divergent behavior by means of Shapley values.
- Score: 12.268135088806613
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When analyzing the behavior of machine learning algorithms, it is important
to identify specific data subgroups for which the considered algorithm shows
different performance with respect to the entire dataset. The intervention of
domain experts is normally required to identify relevant attributes that define
these subgroups.
We introduce the notion of divergence to measure this performance difference
and we exploit it in the context of (i) classification models and (ii) ranking
applications to automatically detect data subgroups showing a significant
deviation in their behavior. Furthermore, we quantify the contribution of all
attributes in the data subgroup to the divergent behavior by means of Shapley
values, thus allowing the identification of the most impacting attributes.
Related papers
- From A-to-Z Review of Clustering Validation Indices [4.08908337437878]
We review and evaluate the performance of internal and external clustering validation indices on the most common clustering algorithms.
We suggest a classification framework for examining the functionality of both internal and external clustering validation measures.
arXiv Detail & Related papers (2024-07-18T13:52:02Z) - A structured regression approach for evaluating model performance across intersectional subgroups [53.91682617836498]
Disaggregated evaluation is a central task in AI fairness assessment, where the goal is to measure an AI system's performance across different subgroups.
We introduce a structured regression approach to disaggregated evaluation that we demonstrate can yield reliable system performance estimates even for very small subgroups.
arXiv Detail & Related papers (2024-01-26T14:21:45Z) - Outlier detection using flexible categorisation and interrogative
agendas [42.321011564731585]
Different ways to categorize a given set of objects exist, which depend on the choice of the sets of features used to classify them.
We first develop a simple unsupervised FCA-based algorithm for outlier detection which uses categorizations arising from different agendas.
We then present a supervised meta-learning algorithm to learn suitable agendas for categorization as sets of features with different weights or masses.
arXiv Detail & Related papers (2023-12-19T10:05:09Z) - Leveraging Structure for Improved Classification of Grouped Biased Data [8.121462458089143]
We consider semi-supervised binary classification for applications in which data points are naturally grouped.
We derive a semi-supervised algorithm that explicitly leverages the structure to learn an optimal, group-aware, probability-outputd classifier.
arXiv Detail & Related papers (2022-12-07T15:18:21Z) - Subgroup Discovery in Unstructured Data [7.6323763630645285]
Subgroup discovery has numerous applications in knowledge discovery and hypothesis generation.
Subgroup-aware variational autoencoder learns a representation of unstructured data which leads to subgroups with higher quality.
arXiv Detail & Related papers (2022-07-15T23:13:54Z) - Estimating Structural Disparities for Face Models [54.062512989859265]
In machine learning, disparity metrics are often defined by measuring the difference in the performance or outcome of a model, across different sub-populations.
We explore performing such analysis on computer vision models trained on human faces, and on tasks such as face attribute prediction and affect estimation.
arXiv Detail & Related papers (2022-04-13T05:30:53Z) - Towards Group Robustness in the presence of Partial Group Labels [61.33713547766866]
spurious correlations between input samples and the target labels wrongly direct the neural network predictions.
We propose an algorithm that optimize for the worst-off group assignments from a constraint set.
We show improvements in the minority group's performance while preserving overall aggregate accuracy across groups.
arXiv Detail & Related papers (2022-01-10T22:04:48Z) - Representation Matters: Assessing the Importance of Subgroup Allocations
in Training Data [85.43008636875345]
We show that diverse representation in training data is key to increasing subgroup performances and achieving population level objectives.
Our analysis and experiments describe how dataset compositions influence performance and provide constructive results for using trends in existing data, alongside domain knowledge, to help guide intentional, objective-aware dataset design.
arXiv Detail & Related papers (2021-03-05T00:27:08Z) - Through the Data Management Lens: Experimental Analysis and Evaluation
of Fair Classification [75.49600684537117]
Data management research is showing an increasing presence and interest in topics related to data and algorithmic fairness.
We contribute a broad analysis of 13 fair classification approaches and additional variants, over their correctness, fairness, efficiency, scalability, and stability.
Our analysis highlights novel insights on the impact of different metrics and high-level approach characteristics on different aspects of performance.
arXiv Detail & Related papers (2021-01-18T22:55:40Z) - Adaptive Object Detection with Dual Multi-Label Prediction [78.69064917947624]
We propose a novel end-to-end unsupervised deep domain adaptation model for adaptive object detection.
The model exploits multi-label prediction to reveal the object category information in each image.
We introduce a prediction consistency regularization mechanism to assist object detection.
arXiv Detail & Related papers (2020-03-29T04:23:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.