A Study of Unsupervised Evaluation Metrics for Practical and Automatic
Domain Adaptation
- URL: http://arxiv.org/abs/2308.00287v2
- Date: Mon, 18 Sep 2023 11:19:20 GMT
- Title: A Study of Unsupervised Evaluation Metrics for Practical and Automatic
Domain Adaptation
- Authors: Minghao Chen, Zepeng Gao, Shuai Zhao, Qibo Qiu, Wenxiao Wang, Binbin
Lin, Xiaofei He
- Abstract summary: Unsupervised domain adaptation (UDA) methods facilitate the transfer of models to target domains without labels.
In this paper, we aim to find an evaluation metric capable of assessing the quality of a transferred model without access to target validation labels.
- Score: 15.728090002818963
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Unsupervised domain adaptation (UDA) methods facilitate the transfer of
models to target domains without labels. However, these methods necessitate a
labeled target validation set for hyper-parameter tuning and model selection.
In this paper, we aim to find an evaluation metric capable of assessing the
quality of a transferred model without access to target validation labels. We
begin with the metric based on mutual information of the model prediction.
Through empirical analysis, we identify three prevalent issues with this
metric: 1) It does not account for the source structure. 2) It can be easily
attacked. 3) It fails to detect negative transfer caused by the over-alignment
of source and target features. To address the first two issues, we incorporate
source accuracy into the metric and employ a new MLP classifier that is held
out during training, significantly improving the result. To tackle the final
issue, we integrate this enhanced metric with data augmentation, resulting in a
novel unsupervised UDA metric called the Augmentation Consistency Metric (ACM).
Additionally, we empirically demonstrate the shortcomings of previous
experiment settings and conduct large-scale experiments to validate the
effectiveness of our proposed metric. Furthermore, we employ our metric to
automatically search for the optimal hyper-parameter set, achieving superior
performance compared to manually tuned sets across four common benchmarks.
Codes will be available soon.
Related papers
- Weak Supervision Performance Evaluation via Partial Identification [46.73061437177238]
Programmatic Weak Supervision (PWS) enables supervised model training without direct access to ground truth labels.
We present a novel method to address this challenge by framing model evaluation as a partial identification problem.
Our approach derives reliable bounds on key metrics without requiring labeled data, overcoming core limitations in current weak supervision evaluation techniques.
arXiv Detail & Related papers (2023-12-07T07:15:11Z) - Better Practices for Domain Adaptation [62.70267990659201]
Domain adaptation (DA) aims to provide frameworks for adapting models to deployment data without using labels.
Unclear validation protocol for DA has led to bad practices in the literature.
We show challenges across all three branches of domain adaptation methodology.
arXiv Detail & Related papers (2023-09-07T17:44:18Z) - Exploring validation metrics for offline model-based optimisation with
diffusion models [50.404829846182764]
In model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of reward with respect to a black box function called the (ground truth) oracle.
While an approximation to the ground oracle can be trained and used in place of it during model validation to measure the mean reward over generated candidates, the evaluation is approximate and vulnerable to adversarial examples.
This is encapsulated under our proposed evaluation framework which is also designed to measure extrapolation.
arXiv Detail & Related papers (2022-11-19T16:57:37Z) - Improving Test-Time Adaptation via Shift-agnostic Weight Regularization
and Nearest Source Prototypes [18.140619966865955]
We propose a novel test-time adaptation strategy that adjusts the model pre-trained on the source domain using only unlabeled online data from the target domain.
We show that our method exhibits state-of-the-art performance on various standard benchmarks and even outperforms its supervised counterpart.
arXiv Detail & Related papers (2022-07-24T10:17:05Z) - Evaluating natural language processing models with generalization
metrics that do not need access to any training or testing data [66.11139091362078]
We provide the first model selection results on large pretrained Transformers from Huggingface using generalization metrics.
Despite their niche status, we find that metrics derived from the heavy-tail (HT) perspective are particularly useful in NLP tasks.
arXiv Detail & Related papers (2022-02-06T20:07:35Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Tune it the Right Way: Unsupervised Validation of Domain Adaptation via
Soft Neighborhood Density [125.64297244986552]
We propose an unsupervised validation criterion that measures the density of soft neighborhoods by computing the entropy of the similarity distribution between points.
Our criterion is simpler than competing validation methods, yet more effective.
arXiv Detail & Related papers (2021-08-24T17:41:45Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.