The SVHN Dataset Is Deceptive for Probabilistic Generative Models Due to
a Distribution Mismatch
- URL: http://arxiv.org/abs/2312.02168v2
- Date: Wed, 6 Dec 2023 05:16:37 GMT
- Title: The SVHN Dataset Is Deceptive for Probabilistic Generative Models Due to
a Distribution Mismatch
- Authors: Tim Z. Xiao, Johannes Zenn, Robert Bamler
- Abstract summary: The Street View House Numbers dataset is a popular benchmark dataset in deep learning.
We warn that the official split into training set and test set of the SVHN dataset are not drawn from the same distribution.
We propose to mix and re-split the official training and test set when SVHN is used for tasks other than classification.
- Score: 12.542073306638988
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Street View House Numbers (SVHN) dataset is a popular benchmark dataset
in deep learning. Originally designed for digit classification tasks, the SVHN
dataset has been widely used as a benchmark for various other tasks including
generative modeling. However, with this work, we aim to warn the community
about an issue of the SVHN dataset as a benchmark for generative modeling
tasks: we discover that the official split into training set and test set of
the SVHN dataset are not drawn from the same distribution. We empirically show
that this distribution mismatch has little impact on the classification task
(which may explain why this issue has not been detected before), but it
severely affects the evaluation of probabilistic generative models, such as
Variational Autoencoders and diffusion models. As a workaround, we propose to
mix and re-split the official training and test set when SVHN is used for tasks
other than classification. We publish a new split and the indices we used to
create it at https://jzenn.github.io/svhn-remix/ .
Related papers
- SOAK: Same/Other/All K-fold cross-validation for estimating similarity of patterns in data subsets [39.12222516332026]
We propose SOAK, Same/Other/All K-fold cross-validation, a new method which can be used to answer both questions.
SOAK systematically compares models which are trained on different subsets of data, and then used for prediction on a fixed test subset, to estimate the similarity of learnable/predictable patterns in data subsets.
arXiv Detail & Related papers (2024-10-11T09:10:39Z) - Adapted-MoE: Mixture of Experts with Test-Time Adaption for Anomaly Detection [10.12283550685127]
We propose an Adapted-MoE to handle multiple distributions of same-category samples by divide and conquer.
Specifically, we propose a routing network based on representation learning to route same-category samples into the subclasses feature space.
We propose the test-time adaption to eliminate the bias between the unseen test sample representation and the feature distribution learned by the expert model.
arXiv Detail & Related papers (2024-09-09T13:49:09Z) - Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z) - Task-customized Masked AutoEncoder via Mixture of Cluster-conditional
Experts [104.9871176044644]
Masked Autoencoder(MAE) is a prevailing self-supervised learning method that achieves promising results in model pre-training.
We propose a novel MAE-based pre-training paradigm, Mixture of Cluster-conditional Experts (MoCE)
MoCE trains each expert only with semantically relevant images by using cluster-conditional gates.
arXiv Detail & Related papers (2024-02-08T03:46:32Z) - Dataset Interfaces: Diagnosing Model Failures Using Controllable
Counterfactual Generation [85.13934713535527]
Distribution shift is a major source of failure for machine learning models.
We introduce the notion of a dataset interface: a framework that, given an input dataset and a user-specified shift, returns instances that exhibit the desired shift.
We demonstrate how applying this dataset interface to the ImageNet dataset enables studying model behavior across a diverse array of distribution shifts.
arXiv Detail & Related papers (2023-02-15T18:56:26Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Model Rectification via Unknown Unknowns Extraction from Deployment
Samples [8.0497115494227]
We propose a general algorithmic framework that aims to perform a post-training model rectification at deployment time in a supervised way.
RTSCV extracts unknown unknowns (u.u.s)
We show that RTSCV consistently outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2021-02-08T11:46:19Z) - WILDS: A Benchmark of in-the-Wild Distribution Shifts [157.53410583509924]
Distribution shifts can substantially degrade the accuracy of machine learning systems deployed in the wild.
We present WILDS, a curated collection of 8 benchmark datasets that reflect a diverse range of distribution shifts.
We show that standard training results in substantially lower out-of-distribution than in-distribution performance.
arXiv Detail & Related papers (2020-12-14T11:14:56Z) - BREEDS: Benchmarks for Subpopulation Shift [98.90314444545204]
We develop a methodology for assessing the robustness of models to subpopulation shift.
We leverage the class structure underlying existing datasets to control the data subpopulations that comprise the training and test distributions.
Applying this methodology to the ImageNet dataset, we create a suite of subpopulation shift benchmarks of varying granularity.
arXiv Detail & Related papers (2020-08-11T17:04:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.