Three New Validators and a Large-Scale Benchmark Ranking for
Unsupervised Domain Adaptation
- URL: http://arxiv.org/abs/2208.07360v4
- Date: Wed, 17 May 2023 23:24:06 GMT
- Title: Three New Validators and a Large-Scale Benchmark Ranking for
Unsupervised Domain Adaptation
- Authors: Kevin Musgrave, Serge Belongie, Ser-Nam Lim
- Abstract summary: We propose three new validators for unsupervised domain adaptation (UDA)
We compare and rank them against five other existing validators, on a large dataset of 1,000,000 checkpoints.
We find that two of our proposed validators achieve state-of-the-art performance in various settings.
- Score: 37.03614011735927
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Changes to hyperparameters can have a dramatic effect on model accuracy.
Thus, the tuning of hyperparameters plays an important role in optimizing
machine-learning models. An integral part of the hyperparameter-tuning process
is the evaluation of model checkpoints, which is done through the use of
"validators". In a supervised setting, these validators evaluate checkpoints by
computing accuracy on a validation set that has labels. In contrast, in an
unsupervised setting, the validation set has no such labels. Without any
labels, it is impossible to compute accuracy, so validators must estimate
accuracy instead. But what is the best approach to estimating accuracy? In this
paper, we consider this question in the context of unsupervised domain
adaptation (UDA). Specifically, we propose three new validators, and we compare
and rank them against five other existing validators, on a large dataset of
1,000,000 checkpoints. Extensive experimental results show that two of our
proposed validators achieve state-of-the-art performance in various settings.
Finally, we find that in many cases, the state-of-the-art is obtained by a
simple baseline method. To the best of our knowledge, this is the largest
empirical study of UDA validators to date. Code is available at
https://www.github.com/KevinMusgrave/powerful-benchmarker.
Related papers
- Test-Time Adaptation with Binary Feedback [50.20923012663613]
BiTTA is a novel dual-path optimization framework that balances binary feedback-guided adaptation on uncertain samples with agreement-based self-adaptation on confident predictions.<n> Experiments show BiTTA achieves 13.3%p accuracy improvements over state-of-the-art baselines.
arXiv Detail & Related papers (2025-05-24T05:24:10Z) - Crafting Distribution Shifts for Validation and Training in Single Source Domain Generalization [9.323285246024518]
Single-source domain generalization attempts to learn a model on a source domain and deploy it to unseen target domains.
Standard practice of validation on the training distribution does not accurately reflect the model's generalization ability.
We construct an independent validation set by transforming source domain images with a comprehensive list of augmentations.
arXiv Detail & Related papers (2024-09-29T20:52:50Z) - SKADA-Bench: Benchmarking Unsupervised Domain Adaptation Methods with Realistic Validation [55.87169702896249]
Unsupervised Domain Adaptation (DA) consists of adapting a model trained on a labeled source domain to perform well on an unlabeled target domain with some data distribution shift.
We propose a framework to evaluate DA methods and present a fair evaluation of existing shallow algorithms, including reweighting, mapping, and subspace alignment.
Our benchmark highlights the importance of realistic validation and provides practical guidance for real-life applications.
arXiv Detail & Related papers (2024-07-16T12:52:29Z) - Better Practices for Domain Adaptation [62.70267990659201]
Domain adaptation (DA) aims to provide frameworks for adapting models to deployment data without using labels.
Unclear validation protocol for DA has led to bad practices in the literature.
We show challenges across all three branches of domain adaptation methodology.
arXiv Detail & Related papers (2023-09-07T17:44:18Z) - A Study of Unsupervised Evaluation Metrics for Practical and Automatic
Domain Adaptation [15.728090002818963]
Unsupervised domain adaptation (UDA) methods facilitate the transfer of models to target domains without labels.
In this paper, we aim to find an evaluation metric capable of assessing the quality of a transferred model without access to target validation labels.
arXiv Detail & Related papers (2023-08-01T05:01:05Z) - Exploring validation metrics for offline model-based optimisation with
diffusion models [50.404829846182764]
In model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of reward with respect to a black box function called the (ground truth) oracle.
While an approximation to the ground oracle can be trained and used in place of it during model validation to measure the mean reward over generated candidates, the evaluation is approximate and vulnerable to adversarial examples.
This is encapsulated under our proposed evaluation framework which is also designed to measure extrapolation.
arXiv Detail & Related papers (2022-11-19T16:57:37Z) - Rethinking the Setting of Semi-supervised Learning on Graphs [29.5439965223]
We argue that the present setting of semisupervised learning on graphs may result in unfair comparisons.
We propose ValidUtil, an approach to fully utilize the label information in the validation set.
arXiv Detail & Related papers (2022-05-28T11:31:19Z) - On the Limits of Evaluating Embodied Agent Model Generalization Using
Validation Sets [101.28658250723804]
This paper experiments with augmenting a transformer model with modules that effectively utilize a wider field of view and learn to choose whether the next step requires a navigation or manipulation action.
We observe that the proposed modules resulted in improved, and in fact state-of-the-art performance on an unseen validation set of a popular benchmark dataset, ALFRED.
We highlight this result as we believe it may be a wider phenomenon in machine learning tasks but primarily noticeable only in benchmarks that limit evaluations on test splits.
arXiv Detail & Related papers (2022-05-18T23:52:21Z) - Tune it the Right Way: Unsupervised Validation of Domain Adaptation via
Soft Neighborhood Density [125.64297244986552]
We propose an unsupervised validation criterion that measures the density of soft neighborhoods by computing the entropy of the similarity distribution between points.
Our criterion is simpler than competing validation methods, yet more effective.
arXiv Detail & Related papers (2021-08-24T17:41:45Z) - Robustness of Accuracy Metric and its Inspirations in Learning with
Noisy Labels [51.66448070984615]
We show that maximizing training accuracy on sufficiently many noisy samples yields an approximately optimal classifier.
For validation, we prove that a noisy validation set is reliable, addressing the critical demand of model selection.
We show characterizations of models trained with noisy labels, motivated by our theoretical results, and verify the utility of a noisy validation set.
arXiv Detail & Related papers (2020-12-08T03:37:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.