SubROC: AUC-Based Discovery of Exceptional Subgroup Performance for Binary Classifiers
- URL: http://arxiv.org/abs/2505.11283v1
- Date: Fri, 16 May 2025 14:18:40 GMT
- Title: SubROC: AUC-Based Discovery of Exceptional Subgroup Performance for Binary Classifiers
- Authors: Tom Siegl, Kutalmış Coşkun, Bjarne Hiller, Amin Mirzaei, Florian Lemmerich, Martin Becker,
- Abstract summary: SubROC is a framework based on Model Mining for reliably and efficiently finding strengths and weaknesses of classification models.<n>It incorporates common evaluation measures (ROC and PR AUC), efficient search space pruning for fast exhaustive subgroup search, control for class imbalance, adjustment for redundant patterns, and significance testing.
- Score: 1.533848041901807
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning (ML) is increasingly employed in real-world applications like medicine or economics, thus, potentially affecting large populations. However, ML models often do not perform homogeneously across such populations resulting in subgroups of the population (e.g., sex=female AND marital_status=married) where the model underperforms or, conversely, is particularly accurate. Identifying and describing such subgroups can support practical decisions on which subpopulation a model is safe to deploy or where more training data is required. The potential of identifying and analyzing such subgroups has been recognized, however, an efficient and coherent framework for effective search is missing. Consequently, we introduce SubROC, an open-source, easy-to-use framework based on Exceptional Model Mining for reliably and efficiently finding strengths and weaknesses of classification models in the form of interpretable population subgroups. SubROC incorporates common evaluation measures (ROC and PR AUC), efficient search space pruning for fast exhaustive subgroup search, control for class imbalance, adjustment for redundant patterns, and significance testing. We illustrate the practical benefits of SubROC in case studies as well as in comparative analyses across multiple datasets.
Related papers
- Diverse Prototypical Ensembles Improve Robustness to Subpopulation Shift [10.04893653044606]
Subpopulationtion shift can significantly degrade the performance of machine learning models.<n>We propose using an ensemble of diverse classifiers to adaptively capture risk associated with subpopulations.<n>Our method of Diverse Prototypical Ensembles (DPEs) often outperforms the prior state-of-the-art in worst-group accuracy.
arXiv Detail & Related papers (2025-05-29T03:12:56Z) - Boosting Test Performance with Importance Sampling--a Subpopulation Perspective [16.678910111353307]
In this paper, we identify important sampling as a simple yet powerful tool for solving the subpopulation problem.<n>We provide a new systematic formulation of the subpopulation problem and explicitly identify the assumptions that are not clearly stated in the existing works.<n>On the application side, we demonstrate a single estimator is enough to solve the subpopulation problem.
arXiv Detail & Related papers (2024-12-17T15:25:24Z) - Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models [83.02797560769285]
Data-Free Meta-Learning (DFML) aims to derive knowledge from a collection of pre-trained models without accessing their original data.<n>Current methods often overlook the heterogeneity among pre-trained models, which leads to performance degradation due to task conflicts.
arXiv Detail & Related papers (2024-05-26T13:11:55Z) - Minimax Regret Learning for Data with Heterogeneous Subgroups [12.253779655660571]
We develop a min-max-regret (MMR) learning framework for general supervised learning, which targets to minimize the worst-group regret.
We demonstrate the effectiveness of our method through extensive simulation studies and an application to kidney transplantation data from hundreds of transplant centers.
arXiv Detail & Related papers (2024-05-02T20:06:41Z) - How does promoting the minority fraction affect generalization? A theoretical study of the one-hidden-layer neural network on group imbalance [64.1656365676171]
Group imbalance has been a known problem in empirical risk minimization.
This paper quantifies the impact of individual groups on the sample complexity, the convergence rate, and the average and group-level testing performance.
arXiv Detail & Related papers (2024-03-12T04:38:05Z) - A structured regression approach for evaluating model performance across intersectional subgroups [53.91682617836498]
Disaggregated evaluation is a central task in AI fairness assessment, where the goal is to measure an AI system's performance across different subgroups.
We introduce a structured regression approach to disaggregated evaluation that we demonstrate can yield reliable system performance estimates even for very small subgroups.
arXiv Detail & Related papers (2024-01-26T14:21:45Z) - Multi-dimensional domain generalization with low-rank structures [18.565189720128856]
In statistical and machine learning methods, it is typically assumed that the test data are identically distributed with the training data.
This assumption does not always hold, especially in applications where the target population are not well-represented in the training data.
We present a novel approach to addressing this challenge in linear regression models.
arXiv Detail & Related papers (2023-09-18T08:07:58Z) - Where Does My Model Underperform? A Human Evaluation of Slice Discovery
Algorithms [24.127380328812855]
New slice discovery algorithms aim to group together coherent and high-error subsets of data.
We show 40 slices output by two state-of-the-art slice discovery algorithms to users, and ask them to form hypotheses about an object detection model.
Our results provide positive evidence that these tools provide some benefit over a naive baseline, and also shed light on challenges faced by users during the hypothesis formation step.
arXiv Detail & Related papers (2023-06-13T22:44:53Z) - Feature Importance Disparities for Data Bias Investigations [2.184775414778289]
It is widely held that one cause of downstream bias in classifiers is bias present in the training data.
We present one such method that given a dataset $X$ consisting of protected and unprotected features, outcomes $y$, and a regressor $h$ that predicts $y$ given $X$.
We show across $4$ datasets and $4$ common feature importance methods of broad interest to the machine learning community that we can efficiently find subgroups with large FID values even over exponentially large subgroup classes.
arXiv Detail & Related papers (2023-03-03T04:12:04Z) - Examining and Combating Spurious Features under Distribution Shift [94.31956965507085]
We define and analyze robust and spurious representations using the information-theoretic concept of minimal sufficient statistics.
We prove that even when there is only bias of the input distribution, models can still pick up spurious features from their training data.
Inspired by our analysis, we demonstrate that group DRO can fail when groups do not directly account for various spurious correlations.
arXiv Detail & Related papers (2021-06-14T05:39:09Z) - LOGAN: Local Group Bias Detection by Clustering [86.38331353310114]
We argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model.
We propose LOGAN, a new bias detection technique based on clustering.
Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region.
arXiv Detail & Related papers (2020-10-06T16:42:51Z) - BREEDS: Benchmarks for Subpopulation Shift [98.90314444545204]
We develop a methodology for assessing the robustness of models to subpopulation shift.
We leverage the class structure underlying existing datasets to control the data subpopulations that comprise the training and test distributions.
Applying this methodology to the ImageNet dataset, we create a suite of subpopulation shift benchmarks of varying granularity.
arXiv Detail & Related papers (2020-08-11T17:04:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.