Agreement-on-the-Line: Predicting the Performance of Neural Networks
under Distribution Shift
- URL: http://arxiv.org/abs/2206.13089v2
- Date: Thu, 11 May 2023 00:39:23 GMT
- Title: Agreement-on-the-Line: Predicting the Performance of Neural Networks
under Distribution Shift
- Authors: Christina Baek, Yiding Jiang, Aditi Raghunathan, Zico Kolter
- Abstract summary: We show a similar but surprising phenomenon also holds for the agreement between pairs of neural network classifiers.
Our prediction algorithm outperforms previous methods both in shifts where agreement-on-the-line holds and, surprisingly, when accuracy is not on the line.
- Score: 18.760716606922482
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, Miller et al. showed that a model's in-distribution (ID) accuracy
has a strong linear correlation with its out-of-distribution (OOD) accuracy on
several OOD benchmarks -- a phenomenon they dubbed ''accuracy-on-the-line''.
While a useful tool for model selection (i.e., the model most likely to perform
the best OOD is the one with highest ID accuracy), this fact does not help
estimate the actual OOD performance of models without access to a labeled OOD
validation set. In this paper, we show a similar but surprising phenomenon also
holds for the agreement between pairs of neural network classifiers: whenever
accuracy-on-the-line holds, we observe that the OOD agreement between the
predictions of any two pairs of neural networks (with potentially different
architectures) also observes a strong linear correlation with their ID
agreement. Furthermore, we observe that the slope and bias of OOD vs ID
agreement closely matches that of OOD vs ID accuracy. This phenomenon, which we
call agreement-on-the-line, has important practical applications: without any
labeled data, we can predict the OOD accuracy of classifiers}, since OOD
agreement can be estimated with just unlabeled data. Our prediction algorithm
outperforms previous methods both in shifts where agreement-on-the-line holds
and, surprisingly, when accuracy is not on the line. This phenomenon also
provides new insights into deep neural networks: unlike accuracy-on-the-line,
agreement-on-the-line appears to only hold for neural network classifiers.
Related papers
- How Does Unlabeled Data Provably Help Out-of-Distribution Detection? [63.41681272937562]
Unlabeled in-the-wild data is non-trivial due to the heterogeneity of both in-distribution (ID) and out-of-distribution (OOD) data.
This paper introduces a new learning framework SAL (Separate And Learn) that offers both strong theoretical guarantees and empirical effectiveness.
arXiv Detail & Related papers (2024-02-05T20:36:33Z) - EAT: Towards Long-Tailed Out-of-Distribution Detection [55.380390767978554]
This paper addresses the challenging task of long-tailed OOD detection.
The main difficulty lies in distinguishing OOD data from samples belonging to the tail classes.
We propose two simple ideas: (1) Expanding the in-distribution class space by introducing multiple abstention classes, and (2) Augmenting the context-limited tail classes by overlaying images onto the context-rich OOD data.
arXiv Detail & Related papers (2023-12-14T13:47:13Z) - Is Fine-tuning Needed? Pre-trained Language Models Are Near Perfect for
Out-of-Domain Detection [28.810524375810736]
Out-of-distribution (OOD) detection is a critical task for reliable predictions over text.
Fine-tuning with pre-trained language models has been a de facto procedure to derive OOD detectors.
We show that using distance-based detection methods, pre-trained language models are near-perfect OOD detectors when the distribution shift involves a domain change.
arXiv Detail & Related papers (2023-05-22T17:42:44Z) - Energy-based Out-of-Distribution Detection for Graph Neural Networks [76.0242218180483]
We propose a simple, powerful and efficient OOD detection model for GNN-based learning on graphs, which we call GNNSafe.
GNNSafe achieves up to $17.0%$ AUROC improvement over state-of-the-arts and it could serve as simple yet strong baselines in such an under-developed area.
arXiv Detail & Related papers (2023-02-06T16:38:43Z) - Calibrated ensembles can mitigate accuracy tradeoffs under distribution
shift [108.30303219703845]
We find that ID-calibrated ensembles outperforms prior state-of-the-art (based on self-training) on both ID and OOD accuracy.
We analyze this method in stylized settings, and identify two important conditions for ensembles to perform well both ID and OOD.
arXiv Detail & Related papers (2022-07-18T23:14:44Z) - Augmenting Softmax Information for Selective Classification with
Out-of-Distribution Data [7.221206118679026]
We show that existing post-hoc methods perform quite differently compared to when evaluated only on OOD detection.
We propose a novel method for SCOD, Softmax Information Retaining Combination (SIRC), that augments softmax-based confidence scores with feature-agnostic information.
Experiments on a wide variety of ImageNet-scale datasets and convolutional neural network architectures show that SIRC is able to consistently match or outperform the baseline for SCOD.
arXiv Detail & Related papers (2022-07-15T14:39:57Z) - Provably Robust Detection of Out-of-distribution Data (almost) for free [124.14121487542613]
Deep neural networks are known to produce highly overconfident predictions on out-of-distribution (OOD) data.
In this paper we propose a novel method where from first principles we combine a certifiable OOD detector with a standard classifier into an OOD aware classifier.
In this way we achieve the best of two worlds: certifiably adversarially robust OOD detection, even for OOD samples close to the in-distribution, without loss in prediction accuracy and close to state-of-the-art OOD detection performance for non-manipulated OOD data.
arXiv Detail & Related papers (2021-06-08T11:40:49Z) - Probing Predictions on OOD Images via Nearest Categories [97.055916832257]
We study out-of-distribution (OOD) prediction behavior of neural networks when they classify images from unseen classes or corrupted images.
We introduce a new measure, nearest category generalization (NCG), where we compute the fraction of OOD inputs that are classified with the same label as their nearest neighbor in the training set.
We find that robust networks have consistently higher NCG accuracy than natural training, even when the OOD data is much farther away than the robustness radius.
arXiv Detail & Related papers (2020-11-17T07:42:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.