Extending the WILDS Benchmark for Unsupervised Adaptation
- URL: http://arxiv.org/abs/2112.05090v1
- Date: Thu, 9 Dec 2021 18:32:38 GMT
- Title: Extending the WILDS Benchmark for Unsupervised Adaptation
- Authors: Shiori Sagawa, Pang Wei Koh, Tony Lee, Irena Gao, Sang Michael Xie,
Kendrick Shen, Ananya Kumar, Weihua Hu, Michihiro Yasunaga, Henrik Marklund,
Sara Beery, Etienne David, Ian Stavness, Wei Guo, Jure Leskovec, Kate Saenko,
Tatsunori Hashimoto, Sergey Levine, Chelsea Finn, Percy Liang
- Abstract summary: We present the WILDS 2.0 update, which extends 8 of the 10 datasets in the WILDS benchmark of distribution shifts to include curated unlabeled data.
These datasets span a wide range of applications (from histology to wildlife conservation), tasks (classification, regression, and detection), and modalities.
We systematically benchmark state-of-the-art methods that leverage unlabeled data, including domain-invariant, self-training, and self-supervised methods.
- Score: 186.90399201508953
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning systems deployed in the wild are often trained on a source
distribution but deployed on a different target distribution. Unlabeled data
can be a powerful point of leverage for mitigating these distribution shifts,
as it is frequently much more available than labeled data. However, existing
distribution shift benchmarks for unlabeled data do not reflect the breadth of
scenarios that arise in real-world applications. In this work, we present the
WILDS 2.0 update, which extends 8 of the 10 datasets in the WILDS benchmark of
distribution shifts to include curated unlabeled data that would be
realistically obtainable in deployment. To maintain consistency, the labeled
training, validation, and test sets, as well as the evaluation metrics, are
exactly the same as in the original WILDS benchmark. These datasets span a wide
range of applications (from histology to wildlife conservation), tasks
(classification, regression, and detection), and modalities (photos, satellite
images, microscope slides, text, molecular graphs). We systematically benchmark
state-of-the-art methods that leverage unlabeled data, including
domain-invariant, self-training, and self-supervised methods, and show that
their success on WILDS 2.0 is limited. To facilitate method development and
evaluation, we provide an open-source package that automates data loading and
contains all of the model architectures and methods used in this paper. Code
and leaderboards are available at https://wilds.stanford.edu.
Related papers
- Continuous Contrastive Learning for Long-Tailed Semi-Supervised Recognition [50.61991746981703]
Current state-of-the-art LTSSL approaches rely on high-quality pseudo-labels for large-scale unlabeled data.
This paper introduces a novel probabilistic framework that unifies various recent proposals in long-tail learning.
We introduce a continuous contrastive learning method, CCL, extending our framework to unlabeled data using reliable and smoothed pseudo-labels.
arXiv Detail & Related papers (2024-10-08T15:06:10Z) - Channel-Selective Normalization for Label-Shift Robust Test-Time Adaptation [16.657929958093824]
Test-time adaptation is an approach to adjust models to a new data distribution during inference.
Test-time batch normalization is a simple and popular method that achieved compelling performance on domain shift benchmarks.
We propose to tackle this challenge by only selectively adapting channels in a deep network, minimizing drastic adaptation that is sensitive to label shifts.
arXiv Detail & Related papers (2024-02-07T15:41:01Z) - Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time [69.77704012415845]
Temporal shifts can considerably degrade performance of machine learning models deployed in the real world.
We benchmark 13 prior approaches, including methods in domain generalization, continual learning, self-supervised learning, and ensemble learning.
Under both evaluation strategies, we observe an average performance drop of 20% from in-distribution to out-of-distribution data.
arXiv Detail & Related papers (2022-11-25T17:07:53Z) - Dual-Curriculum Teacher for Domain-Inconsistent Object Detection in
Autonomous Driving [43.573192013344055]
In autonomous driving, data are usually collected from different scenarios, such as different weather conditions or different times in a day.
It involves two kinds of distribution shifts among different domains, including (1) data distribution discrepancy, and (2) class distribution shifts.
We propose Dual-Curriculum Teacher (DucTeacher) to address this problem.
arXiv Detail & Related papers (2022-10-17T05:00:27Z) - WILDS: A Benchmark of in-the-Wild Distribution Shifts [157.53410583509924]
Distribution shifts can substantially degrade the accuracy of machine learning systems deployed in the wild.
We present WILDS, a curated collection of 8 benchmark datasets that reflect a diverse range of distribution shifts.
We show that standard training results in substantially lower out-of-distribution than in-distribution performance.
arXiv Detail & Related papers (2020-12-14T11:14:56Z) - Adversarial Knowledge Transfer from Unlabeled Data [62.97253639100014]
We present a novel Adversarial Knowledge Transfer framework for transferring knowledge from internet-scale unlabeled data to improve the performance of a classifier.
An important novel aspect of our method is that the unlabeled source data can be of different classes from those of the labeled target data, and there is no need to define a separate pretext task.
arXiv Detail & Related papers (2020-08-13T08:04:27Z) - BREEDS: Benchmarks for Subpopulation Shift [98.90314444545204]
We develop a methodology for assessing the robustness of models to subpopulation shift.
We leverage the class structure underlying existing datasets to control the data subpopulations that comprise the training and test distributions.
Applying this methodology to the ImageNet dataset, we create a suite of subpopulation shift benchmarks of varying granularity.
arXiv Detail & Related papers (2020-08-11T17:04:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.