Diverse, Difficult, and Odd Instances (D2O): A New Test Set for Object
Classification
- URL: http://arxiv.org/abs/2301.12527v1
- Date: Sun, 29 Jan 2023 19:58:32 GMT
- Title: Diverse, Difficult, and Odd Instances (D2O): A New Test Set for Object
Classification
- Authors: Ali Borji
- Abstract summary: We introduce a new test set, called D2O, which is sufficiently different from existing test sets.
Our dataset contains 8,060 images spread across 36 categories, out of which 29 appear in ImageNet.
The best Top-1 accuracy on our dataset is around 60% which is much lower than 91% best Top-1 accuracy on ImageNet.
- Score: 47.64219291655723
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Test sets are an integral part of evaluating models and gauging progress in
object recognition, and more broadly in computer vision and AI. Existing test
sets for object recognition, however, suffer from shortcomings such as bias
towards the ImageNet characteristics and idiosyncrasies (e.g., ImageNet-V2),
being limited to certain types of stimuli (e.g., indoor scenes in ObjectNet),
and underestimating the model performance (e.g., ImageNet-A). To mitigate these
problems, we introduce a new test set, called D2O, which is sufficiently
different from existing test sets. Images are a mix of generated images as well
as images crawled from the web. They are diverse, unmodified, and
representative of real-world scenarios and cause state-of-the-art models to
misclassify them with high confidence. To emphasize generalization, our dataset
by design does not come paired with a training set. It contains 8,060 images
spread across 36 categories, out of which 29 appear in ImageNet. The best Top-1
accuracy on our dataset is around 60% which is much lower than 91% best Top-1
accuracy on ImageNet. We find that popular vision APIs perform very poorly in
detecting objects over D2O categories such as ``faces'', ``cars'', and
``cats''. Our dataset also comes with a ``miscellaneous'' category, over which
we test the image tagging models. Overall, our investigations demonstrate that
the D2O test set contain a mix of images with varied levels of difficulty and
is predictive of the average-case performance of models. It can challenge
object recognition models for years to come and can spur more research in this
fundamental area.
Related papers
- ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object [78.58860252442045]
We introduce generative model as a data source for hard images that benchmark deep models' robustness.
We are able to generate images with more diversified backgrounds, textures, and materials than any prior work, where we term this benchmark as ImageNet-D.
Our work suggests that diffusion models can be an effective source to test vision models.
arXiv Detail & Related papers (2024-03-27T17:23:39Z) - ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing [45.14977000707886]
Higher accuracy on ImageNet usually leads to better robustness against different corruptions.
We create a toolkit for object editing with controls of backgrounds, sizes, positions, and directions.
We evaluate the performance of current deep learning models, including both convolutional neural networks and vision transformers.
arXiv Detail & Related papers (2023-03-30T02:02:32Z) - ImageNet-X: Understanding Model Mistakes with Factor of Variation
Annotations [36.348968311668564]
We introduce ImageNet-X, a set of sixteen human annotations of factors such as pose, background, or lighting.
We investigate 2,200 current recognition models and study the types of mistakes as a function of model's architecture.
We find models have consistent failure modes across ImageNet-X categories.
arXiv Detail & Related papers (2022-11-03T14:56:32Z) - How good are deep models in understanding the generated images? [47.64219291655723]
Two sets of generated images are collected for object recognition and visual question answering tasks.
On object recognition, the best model, out of 10 state-of-the-art object recognition models, achieves about 60% and 80% top-1 and top-5 accuracy.
On VQA, the OFA model scores 77.3% on answering 241 binary questions across 50 images.
arXiv Detail & Related papers (2022-08-23T06:44:43Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Contemplating real-world object classification [53.10151901863263]
We reanalyze the ObjectNet dataset recently proposed by Barbu et al. containing objects in daily life situations.
We find that applying deep models to the isolated objects, rather than the entire scene as is done in the original paper, results in around 20-30% performance improvement.
arXiv Detail & Related papers (2021-03-08T23:29:59Z) - Rethinking Natural Adversarial Examples for Classification Models [43.87819913022369]
ImageNet-A is a famous dataset of natural adversarial examples.
We validated the hypothesis by reducing the background influence in ImageNet-A examples with object detection techniques.
Experiments showed that the object detection models with various classification models as backbones obtained much higher accuracy than their corresponding classification models.
arXiv Detail & Related papers (2021-02-23T14:46:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.