On the Efficacy of Generalization Error Prediction Scoring Functions
- URL: http://arxiv.org/abs/2303.13589v2
- Date: Mon, 29 May 2023 16:23:44 GMT
- Title: On the Efficacy of Generalization Error Prediction Scoring Functions
- Authors: Puja Trivedi, Danai Koutra, Jayaraman J. Thiagarajan
- Abstract summary: Generalization error predictors (GEPs) aim to predict model performance on unseen distributions by deriving dataset-level error estimates from sample-level scores.
We rigorously study the effectiveness of popular scoring functions (confidence, local manifold smoothness, model agreement) independent of mechanism choice.
- Score: 33.24980750651318
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalization error predictors (GEPs) aim to predict model performance on
unseen distributions by deriving dataset-level error estimates from
sample-level scores. However, GEPs often utilize disparate mechanisms (e.g.,
regressors, thresholding functions, calibration datasets, etc), to derive such
error estimates, which can obfuscate the benefits of a particular scoring
function. Therefore, in this work, we rigorously study the effectiveness of
popular scoring functions (confidence, local manifold smoothness, model
agreement), independent of mechanism choice. We find, absent complex
mechanisms, that state-of-the-art confidence- and smoothness- based scores fail
to outperform simple model-agreement scores when estimating error under
distribution shifts and corruptions. Furthermore, on realistic settings where
the training data has been compromised (e.g., label noise, measurement noise,
undersampling), we find that model-agreement scores continue to perform well
and that ensemble diversity is important for improving its performance.
Finally, to better understand the limitations of scoring functions, we
demonstrate that simplicity bias, or the propensity of deep neural networks to
rely upon simple but brittle features, can adversely affect GEP performance.
Overall, our work carefully studies the effectiveness of popular scoring
functions in realistic settings and helps to better understand their
limitations.
Related papers
- Controlling Learned Effects to Reduce Spurious Correlations in Text
Classifiers [6.662800021628275]
We propose an algorithm to regularize the learnt effect of the features on the model's prediction to the estimated effect of feature on label.
On toxicity and IMDB review datasets, the proposed algorithm minimises spurious correlations and improves the minority group.
arXiv Detail & Related papers (2023-05-26T12:15:54Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores.
We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z) - Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient
for Out-of-Distribution Generalization [52.7137956951533]
We argue that devising simpler methods for learning predictors on existing features is a promising direction for future research.
We introduce Domain-Adjusted Regression (DARE), a convex objective for learning a linear predictor that is provably robust under a new model of distribution shift.
Under a natural model, we prove that the DARE solution is the minimax-optimal predictor for a constrained set of test distributions.
arXiv Detail & Related papers (2022-02-14T16:42:16Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Increasing Fairness in Predictions Using Bias Parity Score Based Loss
Function Regularization [0.8594140167290099]
We introduce a family of fairness enhancing regularization components that we use in conjunction with the traditional binary-cross-entropy based accuracy loss.
We deploy them in the context of a recidivism prediction task as well as on a census-based adult income dataset.
arXiv Detail & Related papers (2021-11-05T17:42:33Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z) - Estimating Structural Target Functions using Machine Learning and
Influence Functions [103.47897241856603]
We propose a new framework for statistical machine learning of target functions arising as identifiable functionals from statistical models.
This framework is problem- and model-agnostic and can be used to estimate a broad variety of target parameters of interest in applied statistics.
We put particular focus on so-called coarsening at random/doubly robust problems with partially unobserved information.
arXiv Detail & Related papers (2020-08-14T16:48:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.