Understanding Prediction Discrepancies in Machine Learning Classifiers
- URL: http://arxiv.org/abs/2104.05467v2
- Date: Wed, 31 Jul 2024 10:26:55 GMT
- Title: Understanding Prediction Discrepancies in Machine Learning Classifiers
- Authors: Xavier Renard, Thibault Laugel, Marcin Detyniecki,
- Abstract summary: This paper proposes to analyze the prediction discrepancies in a pool of best-performing models trained on the same data.
A model-agnostic algorithm, DIG, is proposed to capture and explain discrepancies locally.
- Score: 4.940323406667406
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A multitude of classifiers can be trained on the same data to achieve similar performances during test time, while having learned significantly different classification patterns. This phenomenon, which we call prediction discrepancies, is often associated with the blind selection of one model instead of another with similar performances. When making a choice, the machine learning practitioner has no understanding on the differences between models, their limits, where they agree and where they don't. But his/her choice will result in concrete consequences for instances to be classified in the discrepancy zone, since the final decision will be based on the selected classification pattern. Besides the arbitrary nature of the result, a bad choice could have further negative consequences such as loss of opportunity or lack of fairness. This paper proposes to address this question by analyzing the prediction discrepancies in a pool of best-performing models trained on the same data. A model-agnostic algorithm, DIG, is proposed to capture and explain discrepancies locally, to enable the practitioner to make the best educated decision when selecting a model by anticipating its potential undesired consequences. All the code to reproduce the experiments is available.
Related papers
- Bounding-Box Inference for Error-Aware Model-Based Reinforcement Learning [4.185571779339683]
In model-based reinforcement learning, simulated experiences are often treated as equivalent to experience from the real environment.
We show that best results require distribution insensitive inference to estimate the uncertainty over model-based updates.
We find that bounding-box inference can reliably support effective selective planning.
arXiv Detail & Related papers (2024-06-23T04:23:15Z) - Deep Neural Network Benchmarks for Selective Classification [27.098996474946446]
Multiple selective classification frameworks exist, most of which rely on deep neural network architectures.
We evaluate these approaches using several criteria, including selective error rate, empirical coverage, distribution of rejected instance's classes, and performance on out-of-distribution instances.
arXiv Detail & Related papers (2024-01-23T12:15:47Z) - In Search of Insights, Not Magic Bullets: Towards Demystification of the
Model Selection Dilemma in Heterogeneous Treatment Effect Estimation [92.51773744318119]
This paper empirically investigates the strengths and weaknesses of different model selection criteria.
We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them.
arXiv Detail & Related papers (2023-02-06T16:55:37Z) - How to select predictive models for causal inference? [0.0]
We show that classic machine-learning model selection does not select the best outcome models for causal inference.
We outline a good causal model-selection procedure: using the so-called $Rtext-risk$; using flexible estimators to compute the nuisance models on the train set.
arXiv Detail & Related papers (2023-02-01T10:58:55Z) - Cross-model Fairness: Empirical Study of Fairness and Ethics Under Model Multiplicity [10.144058870887061]
We argue that individuals can be harmed when one predictor is chosen ad hoc from a group of equally well performing models.
Our findings suggest that such unfairness can be readily found in real life and it may be difficult to mitigate by technical means alone.
arXiv Detail & Related papers (2022-03-14T14:33:39Z) - A Tale Of Two Long Tails [4.970364068620608]
We identify examples the model is uncertain about and characterize the source of said uncertainty.
We investigate whether the rate of learning in the presence of additional information differs between atypical and noisy examples.
Our results show that well-designed interventions over the course of training can be an effective way to characterize and distinguish between different sources of uncertainty.
arXiv Detail & Related papers (2021-07-27T22:49:59Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z) - Characterizing Fairness Over the Set of Good Models Under Selective
Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance.
We provide tractable algorithms to compute the range of attainable group-level predictive disparities.
We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z) - Online Active Model Selection for Pre-trained Classifiers [72.84853880948894]
We design an online selective sampling approach that actively selects informative examples to label and outputs the best model with high probability at any round.
Our algorithm can be used for online prediction tasks for both adversarial and streams.
arXiv Detail & Related papers (2020-10-19T19:53:15Z) - Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle.
In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize.
Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z) - Decision-Making with Auto-Encoding Variational Bayes [71.44735417472043]
We show that a posterior approximation distinct from the variational distribution should be used for making decisions.
Motivated by these theoretical results, we propose learning several approximate proposals for the best model.
In addition to toy examples, we present a full-fledged case study of single-cell RNA sequencing.
arXiv Detail & Related papers (2020-02-17T19:23:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.