Reliable Test-Time Adaptation via Agreement-on-the-Line
- URL: http://arxiv.org/abs/2310.04941v1
- Date: Sat, 7 Oct 2023 23:21:25 GMT
- Title: Reliable Test-Time Adaptation via Agreement-on-the-Line
- Authors: Eungyeup Kim, Mingjie Sun, Aditi Raghunathan, Zico Kolter
- Abstract summary: Test-time adaptation (TTA) methods aim to improve robustness to distribution shifts by adapting models using unlabeled data.
We make a notable and surprising observation that TTAed models strongly show the agreement-on-the-line phenomenon.
We leverage these observations to make TTA methods more reliable in three perspectives.
- Score: 26.40837283545848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Test-time adaptation (TTA) methods aim to improve robustness to distribution
shifts by adapting models using unlabeled data from the shifted test
distribution. However, there remain unresolved challenges that undermine the
reliability of TTA, which include difficulties in evaluating TTA performance,
miscalibration after TTA, and unreliable hyperparameter tuning for adaptation.
In this work, we make a notable and surprising observation that TTAed models
strongly show the agreement-on-the-line phenomenon (Baek et al., 2022) across a
wide range of distribution shifts. We find such linear trends occur
consistently in a wide range of models adapted with various hyperparameters,
and persist in distributions where the phenomenon fails to hold in vanilla
models (i.e., before adaptation). We leverage these observations to make TTA
methods more reliable in three perspectives: (i) estimating OOD accuracy
(without labeled data) to determine when TTA helps and when it hurts, (ii)
calibrating TTAed models without label information, and (iii) reliably
determining hyperparameters for TTA without any labeled validation data.
Through extensive experiments, we demonstrate that various TTA methods can be
precisely evaluated, both in terms of their improvements and degradations.
Moreover, our proposed methods on unsupervised calibration and hyperparameters
tuning for TTA achieve results close to the ones assuming access to
ground-truth labels, in terms of both OOD accuracy and calibration error.
Related papers
- Exploring Patterns Behind Sports [3.2838877620203935]
This paper presents a comprehensive framework for time series prediction using a hybrid model that combines ARIMA and LSTM.
The model incorporates feature engineering techniques, including embedding and PCA, to transform raw data into a lower-dimensional representation.
arXiv Detail & Related papers (2025-02-11T11:51:07Z) - Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data [39.40116554523575]
We present Drift-Resilient TabPFN, a fresh approach based on In-Context Learning with a Prior-Data Fitted Network.
It learns to approximate Bayesian inference on synthetic datasets drawn from a prior.
It improves accuracy from 0.688 to 0.744 and ROC AUC from 0.786 to 0.832 while maintaining stronger calibration.
arXiv Detail & Related papers (2024-11-15T23:49:23Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Quality In / Quality Out: Data quality more relevant than model choice in anomaly detection with the UGR'16 [0.29998889086656577]
We show that relatively minor modifications on a benchmark dataset cause significantly more impact on model performance than the specific ML technique considered.
We also show that the measured model performance is uncertain, as a result of labelling inaccuracies.
arXiv Detail & Related papers (2023-05-31T12:03:12Z) - CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances.
We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data.
Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z) - How robust are pre-trained models to distribution shift? [82.08946007821184]
We show how spurious correlations affect the performance of popular self-supervised learning (SSL) and auto-encoder based models (AE)
We develop a novel evaluation scheme with the linear head trained on out-of-distribution (OOD) data, to isolate the performance of the pre-trained models from a potential bias of the linear head used for evaluation.
arXiv Detail & Related papers (2022-06-17T16:18:28Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - A Data-driven feature selection and machine-learning model benchmark for
the prediction of longitudinal dispersion coefficient [29.58577229101903]
An accurate prediction on Longitudinal Dispersion(LD) coefficient can produce a performance leap in related simulation.
In this study, a global optimal feature set was proposed through numerical comparison of the distilled local optimums in performance with representative ML models.
Results show that the support vector machine has significantly better performance than other models.
arXiv Detail & Related papers (2021-07-16T09:50:38Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.