When does dough become a bagel? Analyzing the remaining mistakes on
ImageNet
- URL: http://arxiv.org/abs/2205.04596v1
- Date: Mon, 9 May 2022 23:25:45 GMT
- Title: When does dough become a bagel? Analyzing the remaining mistakes on
ImageNet
- Authors: Vijay Vasudevan, Benjamin Caine, Raphael Gontijo-Lopes, Sara
Fridovich-Keil, Rebecca Roelofs
- Abstract summary: We review and categorize every remaining mistake that a few top models make in order to provide insight into the long-tail of errors on one of the most benchmarked datasets in computer vision.
Our analysis reveals that nearly half of the supposed mistakes are not mistakes at all, and we uncover new valid multi-labels.
To calibrate future progress on ImageNet, we provide an updated multi-label evaluation set, and we curate ImageNet-Major: a 68-example "major error" slice of the obvious mistakes made by today's top models.
- Score: 13.36146792987668
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image classification accuracy on the ImageNet dataset has been a barometer
for progress in computer vision over the last decade. Several recent papers
have questioned the degree to which the benchmark remains useful to the
community, yet innovations continue to contribute gains to performance, with
today's largest models achieving 90%+ top-1 accuracy. To help contextualize
progress on ImageNet and provide a more meaningful evaluation for today's
state-of-the-art models, we manually review and categorize every remaining
mistake that a few top models make in order to provide insight into the
long-tail of errors on one of the most benchmarked datasets in computer vision.
We focus on the multi-label subset evaluation of ImageNet, where today's best
models achieve upwards of 97% top-1 accuracy. Our analysis reveals that nearly
half of the supposed mistakes are not mistakes at all, and we uncover new valid
multi-labels, demonstrating that, without careful review, we are significantly
underestimating the performance of these models. On the other hand, we also
find that today's best models still make a significant number of mistakes (40%)
that are obviously wrong to human reviewers. To calibrate future progress on
ImageNet, we provide an updated multi-label evaluation set, and we curate
ImageNet-Major: a 68-example "major error" slice of the obvious mistakes made
by today's top models -- a slice where models should achieve near perfection,
but today are far from doing so.
Related papers
- Automated Classification of Model Errors on ImageNet [7.455546102930913]
We propose an automated error classification framework to study how modeling choices affect error distributions.
We use our framework to comprehensively evaluate the error distribution of over 900 models.
In particular, we observe that the portion of severe errors drops significantly with top-1 accuracy indicating that, while it underreports a model's true performance, it remains a valuable performance metric.
arXiv Detail & Related papers (2023-11-13T20:41:39Z) - ImagenHub: Standardizing the evaluation of conditional image generation
models [48.51117156168]
This paper proposes ImagenHub, which is a one-stop library to standardize the inference and evaluation of all conditional image generation models.
We design two human evaluation scores, i.e. Semantic Consistency and Perceptual Quality, along with comprehensive guidelines to evaluate generated images.
Our human evaluation achieves a high inter-worker agreement of Krippendorff's alpha on 76% models with a value higher than 0.4.
arXiv Detail & Related papers (2023-10-02T19:41:42Z) - Intrinsic Self-Supervision for Data Quality Audits [35.69673085324971]
Benchmark datasets in computer vision often contain off-topic images, near duplicates, and label errors.
In this paper, we revisit the task of data cleaning and formalize it as either a ranking problem, or a scoring problem.
We find that a specific combination of context-aware self-supervised representation learning and distance-based indicators is effective in finding issues without annotation biases.
arXiv Detail & Related papers (2023-05-26T15:57:04Z) - Diverse, Difficult, and Odd Instances (D2O): A New Test Set for Object
Classification [47.64219291655723]
We introduce a new test set, called D2O, which is sufficiently different from existing test sets.
Our dataset contains 8,060 images spread across 36 categories, out of which 29 appear in ImageNet.
The best Top-1 accuracy on our dataset is around 60% which is much lower than 91% best Top-1 accuracy on ImageNet.
arXiv Detail & Related papers (2023-01-29T19:58:32Z) - ImageNet-X: Understanding Model Mistakes with Factor of Variation
Annotations [36.348968311668564]
We introduce ImageNet-X, a set of sixteen human annotations of factors such as pose, background, or lighting.
We investigate 2,200 current recognition models and study the types of mistakes as a function of model's architecture.
We find models have consistent failure modes across ImageNet-X categories.
arXiv Detail & Related papers (2022-11-03T14:56:32Z) - DOMINO: Domain-aware Model Calibration in Medical Image Segmentation [51.346121016559024]
Modern deep neural networks are poorly calibrated, compromising trustworthiness and reliability.
We propose DOMINO, a domain-aware model calibration method that leverages the semantic confusability and hierarchical similarity between class labels.
Our results show that DOMINO-calibrated deep neural networks outperform non-calibrated models and state-of-the-art morphometric methods in head image segmentation.
arXiv Detail & Related papers (2022-09-13T15:31:52Z) - Contemplating real-world object classification [53.10151901863263]
We reanalyze the ObjectNet dataset recently proposed by Barbu et al. containing objects in daily life situations.
We find that applying deep models to the isolated objects, rather than the entire scene as is done in the original paper, results in around 20-30% performance improvement.
arXiv Detail & Related papers (2021-03-08T23:29:59Z) - High-Performance Large-Scale Image Recognition Without Normalization [34.58818094675353]
Batch normalization is a key component of most image classification models, but it has many undesirable properties.
We develop an adaptive gradient clipping technique which overcomes these instabilities, and design a significantly improved class of Normalizer-Free ResNets.
Our models attain significantly better performance than their batch-normalized counterparts when finetuning on ImageNet after large-scale pre-training.
arXiv Detail & Related papers (2021-02-11T18:23:20Z) - How Well Do Self-Supervised Models Transfer? [92.16372657233394]
We evaluate the transfer performance of 13 top self-supervised models on 40 downstream tasks.
We find ImageNet Top-1 accuracy to be highly correlated with transfer to many-shot recognition.
No single self-supervised method dominates overall, suggesting that universal pre-training is still unsolved.
arXiv Detail & Related papers (2020-11-26T16:38:39Z) - Are we done with ImageNet? [86.01120671361844]
We develop a more robust procedure for collecting human annotations of the ImageNet validation set.
We reassess the accuracy of recently proposed ImageNet classifiers, and find their gains to be substantially smaller than those reported on the original labels.
The original ImageNet labels to no longer be the best predictors of this independently-collected set, indicating that their usefulness in evaluating vision models may be nearing an end.
arXiv Detail & Related papers (2020-06-12T13:17:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.