Related papers: LAION-C: An Out-of-Distribution Benchmark for Web-Scale Vision Models

LAION-C: An Out-of-Distribution Benchmark for Web-Scale Vision Models

URL: http://arxiv.org/abs/2506.16950v1
Date: Fri, 20 Jun 2025 12:32:27 GMT
Title: LAION-C: An Out-of-Distribution Benchmark for Web-Scale Vision Models
Authors: Fanfei Li, Thomas Klein, Wieland Brendel, Robert Geirhos, Roland S. Zimmermann,
Abstract summary: We introduce LAION-C as a benchmark alternative for ImageNet-C.<n>In a comprehensive evaluation of state-of-the-art models, we find that the LAION-C dataset poses significant challenges to contemporary models.<n>We observe a paradigm shift in OOD generalization: from humans outperforming models, to the best models now matching or outperforming the best human observers.
Score: 19.56756019309533
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Out-of-distribution (OOD) robustness is a desired property of computer vision models. Improving model robustness requires high-quality signals from robustness benchmarks to quantify progress. While various benchmark datasets such as ImageNet-C were proposed in the ImageNet era, most ImageNet-C corruption types are no longer OOD relative to today's large, web-scraped datasets, which already contain common corruptions such as blur or JPEG compression artifacts. Consequently, these benchmarks are no longer well-suited for evaluating OOD robustness in the era of web-scale datasets. Indeed, recent models show saturating scores on ImageNet-era OOD benchmarks, indicating that it is unclear whether models trained on web-scale datasets truly become better at OOD generalization or whether they have simply been exposed to the test distortions during training. To address this, we introduce LAION-C as a benchmark alternative for ImageNet-C. LAION-C consists of six novel distortion types specifically designed to be OOD, even for web-scale datasets such as LAION. In a comprehensive evaluation of state-of-the-art models, we find that the LAION-C dataset poses significant challenges to contemporary models, including MLLMs such as Gemini and GPT-4o. We additionally conducted a psychophysical experiment to evaluate the difficulty of our corruptions for human observers, enabling a comparison of models to lab-quality human robustness data. We observe a paradigm shift in OOD generalization: from humans outperforming models, to the best models now matching or outperforming the best human observers.

Related papers

CNS-Bench: Benchmarking Image Classifier Robustness Under Continuous Nuisance Shifts [67.48102304531734]
We introduce CNS-Bench, a Continuous Nuisance Shift Benchmark to quantify robustness of image classifiers for continuous and realistic nuisance shifts.<n>We propose a filtering mechanism that outperforms previous methods, thereby enabling reliable benchmarking with generative models.
arXiv Detail & Related papers (2025-07-23T16:15:48Z)
Self-supervised Benchmark Lottery on ImageNet: Do Marginal Improvements Translate to Improvements on Similar Datasets? [1.3821203559674384]
We investigate whether models that seem to perform well on ImageNet may experience significant performance declines on similar datasets.<n>Specifically, state-of-the-art frameworks such as DINO and Swav, which are praised for their performance, exhibit substantial drops in performance.<n>We argue that otherwise good and desirable properties of models remain hidden when benchmarking is only performed on the ImageNet validation set.
arXiv Detail & Related papers (2025-01-26T07:19:12Z)
In Search of Forgotten Domain Generalization [20.26519807919284]
Out-of-Domain (OOD) generalization is the ability of a model trained on one or more domains to generalize to unseen domains.<n>In the ImageNet era of computer vision, evaluation sets for measuring a model's OOD performance were designed to be strictly OOD with respect to style.<n>The emergence of foundation models and expansive web-scale datasets has obfuscated this evaluation process.
arXiv Detail & Related papers (2024-10-10T17:50:45Z)
Can OOD Object Detectors Learn from Foundation Models? [56.03404530594071]
Out-of-distribution (OOD) object detection is a challenging task due to the absence of open-set OOD data. Inspired by recent advancements in text-to-image generative models, we study the potential of generative models trained on large-scale open-set data to synthesize OOD samples. We introduce SyncOOD, a simple data curation method that capitalizes on the capabilities of large foundation models.
arXiv Detail & Related papers (2024-09-08T17:28:22Z)
SOOD-ImageNet: a Large-Scale Dataset for Semantic Out-Of-Distribution Image Classification and Semantic Segmentation [6.21476985578569]
Out-of-Distribution (OOD) detection in computer vision is a crucial research area. SOOD-ImageNet is a novel dataset comprising around 1.6M images across 56 classes. It is designed for common computer vision tasks such as image classification and semantic segmentation under OOD conditions.
arXiv Detail & Related papers (2024-09-02T09:37:39Z)
ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object [78.58860252442045]
We introduce generative model as a data source for hard images that benchmark deep models' robustness. We are able to generate images with more diversified backgrounds, textures, and materials than any prior work, where we term this benchmark as ImageNet-D. Our work suggests that diffusion models can be an effective source to test vision models.
arXiv Detail & Related papers (2024-03-27T17:23:39Z)
Reliability in Semantic Segmentation: Can We Use Synthetic Data? [69.28268603137546]
We show for the first time how synthetic data can be specifically generated to assess comprehensively the real-world reliability of semantic segmentation models. This synthetic data is employed to evaluate the robustness of pretrained segmenters. We demonstrate how our approach can be utilized to enhance the calibration and OOD detection capabilities of segmenters.
arXiv Detail & Related papers (2023-12-14T18:56:07Z)
Uncertainty in AI: Evaluating Deep Neural Networks on Out-of-Distribution Images [0.0]
This paper investigates the uncertainty of various deep neural networks, including ResNet-50, VGG16, DenseNet121, AlexNet, and GoogleNet, when dealing with perturbed data. While ResNet-50 was the most accurate single model for OOD images, the ensemble performed even better, correctly classifying all images.
arXiv Detail & Related papers (2023-09-04T22:46:59Z)
High-resolution semantically-consistent image-to-image translation [0.0]
This paper proposes an unsupervised domain adaptation model that preserves semantic consistency and per-pixel quality for the images during the style-transferring phase. The proposed model shows substantial performance gain compared to the SemI2I model and reaches similar results as the state-of-the-art CyCADA model.
arXiv Detail & Related papers (2022-09-13T19:08:30Z)
Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples. We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models. We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z)
Contemplating real-world object classification [53.10151901863263]
We reanalyze the ObjectNet dataset recently proposed by Barbu et al. containing objects in daily life situations. We find that applying deep models to the isolated objects, rather than the entire scene as is done in the original paper, results in around 20-30% performance improvement.
arXiv Detail & Related papers (2021-03-08T23:29:59Z)
Assessing out-of-domain generalization for robust building damage detection [78.6363825307044]
Building damage detection can be automated by applying computer vision techniques to satellite imagery. Models must be robust to a shift in distribution between disaster imagery available for training and the images of the new event. We argue that future work should focus on the OOD regime instead.
arXiv Detail & Related papers (2020-11-20T10:30:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.