Related papers: The SVHN Dataset Is Deceptive for Probabilistic Generative Models Due to a Distribution Mismatch

The SVHN Dataset Is Deceptive for Probabilistic Generative Models Due to a Distribution Mismatch

URL: http://arxiv.org/abs/2312.02168v2
Date: Wed, 6 Dec 2023 05:16:37 GMT
Title: The SVHN Dataset Is Deceptive for Probabilistic Generative Models Due to a Distribution Mismatch
Authors: Tim Z. Xiao, Johannes Zenn, Robert Bamler
Abstract summary: The Street View House Numbers dataset is a popular benchmark dataset in deep learning. We warn that the official split into training set and test set of the SVHN dataset are not drawn from the same distribution. We propose to mix and re-split the official training and test set when SVHN is used for tasks other than classification.
Score: 12.542073306638988
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Street View House Numbers (SVHN) dataset is a popular benchmark dataset in deep learning. Originally designed for digit classification tasks, the SVHN dataset has been widely used as a benchmark for various other tasks including generative modeling. However, with this work, we aim to warn the community about an issue of the SVHN dataset as a benchmark for generative modeling tasks: we discover that the official split into training set and test set of the SVHN dataset are not drawn from the same distribution. We empirically show that this distribution mismatch has little impact on the classification task (which may explain why this issue has not been detected before), but it severely affects the evaluation of probabilistic generative models, such as Variational Autoencoders and diffusion models. As a workaround, we propose to mix and re-split the official training and test set when SVHN is used for tasks other than classification. We publish a new split and the indices we used to create it at https://jzenn.github.io/svhn-remix/ .

Related papers

Task-conditioned Ensemble of Expert Models for Continuous Learning [9.973727349235261]
We propose a task-conditioned ensemble of models to maintain the performance of the existing model. The method involves an ensemble of expert models based on task membership information. Experiments highlight the benefits of the proposed method.
arXiv Detail & Related papers (2025-04-11T15:27:29Z)
Generalization is not a universal guarantee: Estimating similarity to training data with an ensemble out-of-distribution metric [0.09363323206192666]
Failure of machine learning models to generalize to new data is a core problem limiting the reliability of AI systems. We propose a standardized approach for assessing data similarity by constructing a supervised autoencoder for generalizability estimation (SAGE) We show that out-of-the-box model performance increases after SAGE score filtering, even when applied to data from the model's own training and test datasets.
arXiv Detail & Related papers (2025-02-22T19:21:50Z)
SOAK: Same/Other/All K-fold cross-validation for estimating similarity of patterns in data subsets [39.12222516332026]
We propose SOAK, Same/Other/All K-fold cross-validation, a new method which can be used to answer both questions. SOAK systematically compares models which are trained on different subsets of data, and then used for prediction on a fixed test subset, to estimate the similarity of learnable/predictable patterns in data subsets.
arXiv Detail & Related papers (2024-10-11T09:10:39Z)
Adapted-MoE: Mixture of Experts with Test-Time Adaption for Anomaly Detection [10.12283550685127]
We propose an Adapted-MoE to handle multiple distributions of same-category samples by divide and conquer. Specifically, we propose a routing network based on representation learning to route same-category samples into the subclasses feature space. We propose the test-time adaption to eliminate the bias between the unseen test sample representation and the feature distribution learned by the expert model.
arXiv Detail & Related papers (2024-09-09T13:49:09Z)
Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples. Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance. We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z)
Task-customized Masked AutoEncoder via Mixture of Cluster-conditional Experts [104.9871176044644]
Masked Autoencoder(MAE) is a prevailing self-supervised learning method that achieves promising results in model pre-training. We propose a novel MAE-based pre-training paradigm, Mixture of Cluster-conditional Experts (MoCE) MoCE trains each expert only with semantically relevant images by using cluster-conditional gates.
arXiv Detail & Related papers (2024-02-08T03:46:32Z)
Dataset Interfaces: Diagnosing Model Failures Using Controllable Counterfactual Generation [85.13934713535527]
Distribution shift is a major source of failure for machine learning models. We introduce the notion of a dataset interface: a framework that, given an input dataset and a user-specified shift, returns instances that exhibit the desired shift. We demonstrate how applying this dataset interface to the ImageNet dataset enables studying model behavior across a diverse array of distribution shifts.
arXiv Detail & Related papers (2023-02-15T18:56:26Z)
Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis. We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z)
Model Rectification via Unknown Unknowns Extraction from Deployment Samples [8.0497115494227]
We propose a general algorithmic framework that aims to perform a post-training model rectification at deployment time in a supervised way. RTSCV extracts unknown unknowns (u.u.s) We show that RTSCV consistently outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2021-02-08T11:46:19Z)
WILDS: A Benchmark of in-the-Wild Distribution Shifts [157.53410583509924]
Distribution shifts can substantially degrade the accuracy of machine learning systems deployed in the wild. We present WILDS, a curated collection of 8 benchmark datasets that reflect a diverse range of distribution shifts. We show that standard training results in substantially lower out-of-distribution than in-distribution performance.
arXiv Detail & Related papers (2020-12-14T11:14:56Z)
BREEDS: Benchmarks for Subpopulation Shift [98.90314444545204]
We develop a methodology for assessing the robustness of models to subpopulation shift. We leverage the class structure underlying existing datasets to control the data subpopulations that comprise the training and test distributions. Applying this methodology to the ImageNet dataset, we create a suite of subpopulation shift benchmarks of varying granularity.
arXiv Detail & Related papers (2020-08-11T17:04:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.