Related papers: MANO: Exploiting Matrix Norm for Unsupervised Accuracy Estimation Under Distribution Shifts

MANO: Exploiting Matrix Norm for Unsupervised Accuracy Estimation Under Distribution Shifts

URL: http://arxiv.org/abs/2405.18979v2
Date: Mon, 24 Jun 2024 09:12:08 GMT
Title: MANO: Exploiting Matrix Norm for Unsupervised Accuracy Estimation Under Distribution Shifts
Authors: Renchunzi Xie, Ambroise Odonnat, Vasilii Feofanov, Weijian Deng, Jianfeng Zhang, Bo An,
Abstract summary: Current logit-based methods are vulnerable to overconfidence issues, leading to prediction bias, especially under the natural shift. We propose MaNo, which applies a data-dependent normalization on the logits to reduce prediction bias, and takes the $L_p$ norm of the matrix of normalized logits as the estimation score. MaNo achieves state-of-the-art performance across various architectures in the presence of synthetic, natural, or subpopulation shifts.
Score: 25.643876327918544
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Leveraging the models' outputs, specifically the logits, is a common approach to estimating the test accuracy of a pre-trained neural network on out-of-distribution (OOD) samples without requiring access to the corresponding ground truth labels. Despite their ease of implementation and computational efficiency, current logit-based methods are vulnerable to overconfidence issues, leading to prediction bias, especially under the natural shift. In this work, we first study the relationship between logits and generalization performance from the view of low-density separation assumption. Our findings motivate our proposed method MaNo which (1) applies a data-dependent normalization on the logits to reduce prediction bias, and (2) takes the $L_p$ norm of the matrix of normalized logits as the estimation score. Our theoretical analysis highlights the connection between the provided score and the model's uncertainty. We conduct an extensive empirical study on common unsupervised accuracy estimation benchmarks and demonstrate that MaNo achieves state-of-the-art performance across various architectures in the presence of synthetic, natural, or subpopulation shifts.

Related papers

Should Bias Always be Eliminated? A Principled Framework to Use Data Bias for OOD Generation [14.271988618123512]
We introduce a novel framework that strategically leverages bias to complement invariant representations during inference.<n>We validate our approach through experiments on both synthetic datasets and standard domain generalization benchmarks.
arXiv Detail & Related papers (2025-07-22T20:17:48Z)
Principled Input-Output-Conditioned Post-Hoc Uncertainty Estimation for Regression Networks [1.4671424999873808]
Uncertainty is critical in safety-sensitive applications but is often omitted from off-the-shelf neural networks due to adverse effects on predictive performance.<n>We propose a theoretically grounded framework for post-hoc uncertainty estimation in regression tasks by fitting an auxiliary model to both original inputs and frozen model outputs.
arXiv Detail & Related papers (2025-06-01T09:13:27Z)
Consistency-based Abductive Reasoning over Perceptual Errors of Multiple Pre-trained Models in Novel Environments [5.5855749614100825]
This paper addresses the hypothesis that leveraging multiple pre-trained models can mitigate this recall reduction.<n>We formulate the challenge of identifying and managing conflicting predictions from various models as a consistency-based abduction problem.<n>Our results validate the use of consistency-based abduction as an effective mechanism to robustly integrate knowledge from multiple imperfect models in challenging, novel scenarios.
arXiv Detail & Related papers (2025-05-25T23:17:47Z)
Are Domain Generalization Benchmarks with Accuracy on the Line Misspecified? [11.534630666670568]
Conventional wisdom suggests that models relying on spurious correlations will fail to generalize out-of-distribution. We show that many widely used benchmarks for evaluating robustness to spurious correlations are misspecified. We highlight the need to rethink how robustness to spurious correlations is assessed, identify well-specified benchmarks the field should prioritize, and enumerate strategies for designing future benchmarks that meaningfully reflect robustness under distribution shift.
arXiv Detail & Related papers (2025-03-31T19:50:04Z)
Ranking and Combining Latent Structured Predictive Scores without Labeled Data [2.5064967708371553]
This paper introduces a novel structured unsupervised ensemble learning model (SUEL) It exploits the dependency between a set of predictors with continuous predictive scores, rank the predictors without labeled data and combine them to an ensembled score with weights. The efficacy of the proposed methods is rigorously assessed through both simulation studies and real-world application of risk genes discovery.
arXiv Detail & Related papers (2024-08-14T20:14:42Z)
ConjNorm: Tractable Density Estimation for Out-of-Distribution Detection [41.41164637577005]
Post-hoc out-of-distribution (OOD) detection has garnered intensive attention in reliable machine learning. We propose a novel theoretical framework grounded in Bregman divergence to provide a unified perspective on density-based score design. We show that our proposed textscConjNorm has established a new state-of-the-art in a variety of OOD detection setups.
arXiv Detail & Related papers (2024-02-27T21:02:47Z)
Leveraging Gradients for Unsupervised Accuracy Estimation under Distribution Shift [24.49100064042827]
Estimating the test performance of a model without access to the ground-truth labels is a challenging problem.<n>We use the norm of classification-layer gradients, backpropagated from the cross-entropy loss after only one gradient step over test data.<n>Our intuition is that these gradients should be of higher magnitude when the model generalizes poorly.
arXiv Detail & Related papers (2024-01-17T01:33:23Z)
Exploiting Observation Bias to Improve Matrix Completion [15.171759590760574]
We propose a natural model where the observation pattern and outcome of interest are driven by the same set of underlying latent (or unobserved) factors. We devise Mask Nearest Neighbor (MNN), a novel two-stage matrix completion algorithm. Our analysis reveals that MNN enjoys entry-wise finite-sample error rates that are competitive with corresponding supervised learning parametric rates.
arXiv Detail & Related papers (2023-06-07T20:48:35Z)
Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores. We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space. GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z)
Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models. In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints. A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z)
Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues. We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders. We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z)
BENN: Bias Estimation Using Deep Neural Network [37.70583323420925]
We present BENN -- a novel bias estimation method that uses a pretrained unsupervised deep neural network. Given a ML model and data samples, BENN provides a bias estimation for every feature based on the model's predictions. We evaluated BENN using three benchmark datasets and one proprietary churn prediction model used by a European Telco.
arXiv Detail & Related papers (2020-12-23T08:25:35Z)
Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers. We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model. Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.