Related papers: Attributing AUC-ROC to Analyze Binary Classifier Performance

Attributing AUC-ROC to Analyze Binary Classifier Performance

URL: http://arxiv.org/abs/2205.11781v1
Date: Tue, 24 May 2022 04:42:52 GMT
Title: Attributing AUC-ROC to Analyze Binary Classifier Performance
Authors: Arya Tafvizi, Besim Avci, Mukund Sundararajan
Abstract summary: We discuss techniques to segment the Area Under the Receiver Operating Characteristic Curve (AUC-ROC) along human-interpretable dimensions. AUC-ROC is not an additive/linear function over the data samples, therefore such segmenting the overall AUC-ROC is different from tabulating the AUC-ROC of data segments.
Score: 13.192005156790302
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Area Under the Receiver Operating Characteristic Curve (AUC-ROC) is a popular evaluation metric for binary classifiers. In this paper, we discuss techniques to segment the AUC-ROC along human-interpretable dimensions. AUC-ROC is not an additive/linear function over the data samples, therefore such segmenting the overall AUC-ROC is different from tabulating the AUC-ROC of data segments. To segment the overall AUC-ROC, we must first solve an \emph{attribution} problem to identify credit for individual examples. We observe that AUC-ROC, though non-linear over examples, is linear over \emph{pairs} of examples. This observation leads to a simple, efficient attribution technique for examples (example attributions), and for pairs of examples (pair attributions). We automatically slice these attributions using decision trees by making the tree predict the attributions; we use the notion of honest estimates along with a t-test to mitigate false discovery. Our experiments with the method show that an inferior model can outperform a superior model (trained to optimize a different training objective) on the inferior model's own training objective, a manifestation of Goodhart's Law. In contrast, AUC attributions enable a reasonable comparison. Example attributions can be used to slice this comparison. Pair attributions are used to categorize pairs of items -- one positively labeled and one negatively -- that the model has trouble separating. These categories identify the decision boundary of the classifier and the headroom to improve AUC.

Related papers

Leveraging Text-to-Image Generation for Handling Spurious Correlation [24.940576844328408]
Deep neural networks trained with Empirical Risk Minimization (ERM) perform well when both training and test data come from the same domain. ERM models may rely on spurious correlations that often exist between labels and irrelevant features of images, making predictions unreliable when those features do not exist. We propose a technique to generate training samples with text-to-image (T2I) diffusion models for addressing the spurious correlation problem.
arXiv Detail & Related papers (2025-03-21T15:28:22Z)
Multiclass ROC [6.941573057921458]
We provide an evaluation metric summarizing the pair-wise multi-class True Positive Rate (TPR) and False Positive Rate (FPR) An integration over those factorized vector provides a binary AUC-equivalent summary on the performance. To support our findings, we conducted extensive simulation studies and compared our method to the pair-wise averaged AUC statistics on benchmark datasets.
arXiv Detail & Related papers (2024-04-19T19:25:10Z)
Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class. Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z)
A structured regression approach for evaluating model performance across intersectional subgroups [53.91682617836498]
Disaggregated evaluation is a central task in AI fairness assessment, where the goal is to measure an AI system's performance across different subgroups. We introduce a structured regression approach to disaggregated evaluation that we demonstrate can yield reliable system performance estimates even for very small subgroups.
arXiv Detail & Related papers (2024-01-26T14:21:45Z)
Theoretical Evaluation of Asymmetric Shapley Values for Root-Cause Analysis [0.0]
Asymmetric Shapley Values (ASV) is a variant of the popular SHAP additive local explanation method. We show how local contributions correspond to global contributions of variance reduction. We identify generalized additive models (GAM) as a restricted class for which ASV exhibits desirable properties.
arXiv Detail & Related papers (2023-10-15T21:40:16Z)
AdAUC: End-to-end Adversarial AUC Optimization Against Long-tail Problems [102.95119281306893]
We present an early trial to explore adversarial training methods to optimize AUC. We reformulate the AUC optimization problem as a saddle point problem, where the objective becomes an instance-wise function. Our analysis differs from the existing studies since the algorithm is asked to generate adversarial examples by calculating the gradient of a min-max problem.
arXiv Detail & Related papers (2022-06-24T09:13:39Z)
Large-Margin Representation Learning for Texture Classification [67.94823375350433]
This paper presents a novel approach combining convolutional layers (CLs) and large-margin metric learning for training supervised models on small datasets for texture classification. The experimental results on texture and histopathologic image datasets have shown that the proposed approach achieves competitive accuracy with lower computational cost and faster convergence when compared to equivalent CNNs.
arXiv Detail & Related papers (2022-06-17T04:07:45Z)
CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator [60.799183326613395]
We propose an unbiased estimator for categorical random variables based on multiple mutually negatively correlated (jointly antithetic) samples. CARMS combines REINFORCE with copula based sampling to avoid duplicate samples and reduce its variance, while keeping the estimator unbiased using importance sampling. We evaluate CARMS on several benchmark datasets on a generative modeling task, as well as a structured output prediction task, and find it to outperform competing methods including a strong self-control baseline.
arXiv Detail & Related papers (2021-10-26T20:14:30Z)
AttriMeter: An Attribute-guided Metric Interpreter for Person Re-Identification [100.3112429685558]
Person ReID systems only provide a distance or similarity when matching two persons. We propose an Attribute-guided Metric Interpreter, named AttriMeter, to semantically and quantitatively explain the results of CNN-based ReID models.
arXiv Detail & Related papers (2021-03-02T03:37:48Z)
Adaptive Name Entity Recognition under Highly Unbalanced Data [5.575448433529451]
We present our experiments on a neural architecture composed of a Conditional Random Field (CRF) layer stacked on top of a Bi-directional LSTM (BI-LSTM) layer for solving NER tasks. We introduce an add-on classification model to split sentences into two different sets: Weak and Strong classes and then designing a couple of Bi-LSTM-CRF models properly to optimize performance on each set.
arXiv Detail & Related papers (2020-03-10T06:56:52Z)
A Distributionally Robust Area Under Curve Maximization Model [1.370633147306388]
We propose two new distributionally robust AUC models (DR-AUC) DR-AUC models rely on the Kantorovich metric and approximate the AUC with the hinge loss function. numerical experiments show that the proposed DR-AUC models perform better in general and in particular improve the worst-case out-of-sample performance.
arXiv Detail & Related papers (2020-02-18T02:50:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.