Related papers: WILDS: A Benchmark of in-the-Wild Distribution Shifts

WILDS: A Benchmark of in-the-Wild Distribution Shifts

URL: http://arxiv.org/abs/2012.07421v2
Date: Tue, 9 Mar 2021 07:49:42 GMT
Title: WILDS: A Benchmark of in-the-Wild Distribution Shifts
Authors: Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, Tony Lee, Etienne David, Ian Stavness, Wei Guo, Berton A. Earnshaw, Imran S. Haque, Sara Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, Percy Liang
Abstract summary: Distribution shifts can substantially degrade the accuracy of machine learning systems deployed in the wild. We present WILDS, a curated collection of 8 benchmark datasets that reflect a diverse range of distribution shifts. We show that standard training results in substantially lower out-of-distribution than in-distribution performance.
Score: 157.53410583509924
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Distribution shifts -- where the training distribution differs from the test distribution -- can substantially degrade the accuracy of machine learning (ML) systems deployed in the wild. Despite their ubiquity, these real-world distribution shifts are under-represented in the datasets widely used in the ML community today. To address this gap, we present WILDS, a curated collection of 8 benchmark datasets that reflect a diverse range of distribution shifts which naturally arise in real-world applications, such as shifts across hospitals for tumor identification; across camera traps for wildlife monitoring; and across time and location in satellite imaging and poverty mapping. On each dataset, we show that standard training results in substantially lower out-of-distribution than in-distribution performance, and that this gap remains even with models trained by existing methods for handling distribution shifts. This underscores the need for new training methods that produce models which are more robust to the types of distribution shifts that arise in practice. To facilitate method development, we provide an open-source package that automates dataset loading, contains default model architectures and hyperparameters, and standardizes evaluations. Code and leaderboards are available at https://wilds.stanford.edu.

Related papers

Distribution Shifts at Scale: Out-of-distribution Detection in Earth Observation [4.457854503856095]
Out-of-distribution (OOD) detection addresses this by identifying inputs that deviate from in-distribution (ID) data. We introduce TARDIS, a post-hoc OOD detection method designed for scalable geospatial deployment. Our core innovation lies in generating surrogate distribution labels by leveraging ID data within the feature space.
arXiv Detail & Related papers (2024-12-18T00:10:44Z)
Class Distribution Shifts in Zero-Shot Learning: Learning Robust Representations [3.8980564330208662]
We propose a model that assumes that the attribute responsible for the shift is unknown in advance. We show that our approach improves generalization on diverse class distributions in both simulations and real-world datasets.
arXiv Detail & Related papers (2023-11-30T14:14:31Z)
Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time [69.77704012415845]
Temporal shifts can considerably degrade performance of machine learning models deployed in the real world. We benchmark 13 prior approaches, including methods in domain generalization, continual learning, self-supervised learning, and ensemble learning. Under both evaluation strategies, we observe an average performance drop of 20% from in-distribution to out-of-distribution data.
arXiv Detail & Related papers (2022-11-25T17:07:53Z)
Robust Calibration with Multi-domain Temperature Scaling [86.07299013396059]
We develop a systematic calibration model to handle distribution shifts by leveraging data from multiple domains. Our proposed method -- multi-domain temperature scaling -- uses the robustness in the domains to improve calibration under distribution shift.
arXiv Detail & Related papers (2022-06-06T17:32:12Z)
Extending the WILDS Benchmark for Unsupervised Adaptation [186.90399201508953]
We present the WILDS 2.0 update, which extends 8 of the 10 datasets in the WILDS benchmark of distribution shifts to include curated unlabeled data. These datasets span a wide range of applications (from histology to wildlife conservation), tasks (classification, regression, and detection), and modalities. We systematically benchmark state-of-the-art methods that leverage unlabeled data, including domain-invariant, self-training, and self-supervised methods.
arXiv Detail & Related papers (2021-12-09T18:32:38Z)
Evaluating Predictive Uncertainty and Robustness to Distributional Shift Using Real World Data [0.0]
We propose metrics for general regression tasks using the Shifts Weather Prediction dataset. We also present an evaluation of the baseline methods using these metrics.
arXiv Detail & Related papers (2021-11-08T17:32:10Z)
Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization [89.73665256847858]
We show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts. Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet. We also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS.
arXiv Detail & Related papers (2021-07-09T19:48:23Z)
BREEDS: Benchmarks for Subpopulation Shift [98.90314444545204]
We develop a methodology for assessing the robustness of models to subpopulation shift. We leverage the class structure underlying existing datasets to control the data subpopulations that comprise the training and test distributions. Applying this methodology to the ImageNet dataset, we create a suite of subpopulation shift benchmarks of varying granularity.
arXiv Detail & Related papers (2020-08-11T17:04:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.