Related papers: Uncertainty in AI: Evaluating Deep Neural Networks on Out-of-Distribution Images

Uncertainty in AI: Evaluating Deep Neural Networks on Out-of-Distribution Images

URL: http://arxiv.org/abs/2309.01850v1
Date: Mon, 4 Sep 2023 22:46:59 GMT
Title: Uncertainty in AI: Evaluating Deep Neural Networks on Out-of-Distribution Images
Authors: Jamiu Idowu and Ahmed Almasoud
Abstract summary: This paper investigates the uncertainty of various deep neural networks, including ResNet-50, VGG16, DenseNet121, AlexNet, and GoogleNet, when dealing with perturbed data. While ResNet-50 was the most accurate single model for OOD images, the ensemble performed even better, correctly classifying all images.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As AI models are increasingly deployed in critical applications, ensuring the consistent performance of models when exposed to unusual situations such as out-of-distribution (OOD) or perturbed data, is important. Therefore, this paper investigates the uncertainty of various deep neural networks, including ResNet-50, VGG16, DenseNet121, AlexNet, and GoogleNet, when dealing with such data. Our approach includes three experiments. First, we used the pretrained models to classify OOD images generated via DALL-E to assess their performance. Second, we built an ensemble from the models' predictions using probabilistic averaging for consensus due to its advantages over plurality or majority voting. The ensemble's uncertainty was quantified using average probabilities, variance, and entropy metrics. Our results showed that while ResNet-50 was the most accurate single model for OOD images, the ensemble performed even better, correctly classifying all images. Third, we tested model robustness by adding perturbations (filters, rotations, etc.) to new epistemic images from DALL-E or real-world captures. ResNet-50 was chosen for this being the best performing model. While it classified 4 out of 5 unperturbed images correctly, it misclassified all of them post-perturbation, indicating a significant vulnerability. These misclassifications, which are clear to human observers, highlight AI models' limitations. Using saliency maps, we identified regions of the images that the model considered important for their decisions.

Related papers

LAION-C: An Out-of-Distribution Benchmark for Web-Scale Vision Models [19.56756019309533]
We introduce LAION-C as a benchmark alternative for ImageNet-C.<n>In a comprehensive evaluation of state-of-the-art models, we find that the LAION-C dataset poses significant challenges to contemporary models.<n>We observe a paradigm shift in OOD generalization: from humans outperforming models, to the best models now matching or outperforming the best human observers.
arXiv Detail & Related papers (2025-06-20T12:32:27Z)
Appeal prediction for AI up-scaled Images [45.61706071739717]
We describe our developed dataset, which uses 136 base images and five different up-scaling methods. We evaluate the appeal of the different methods, and the results indicate that Real-ESRGAN and BSRGAN are the best. In addition to this, we evaluate state-of-the-art image appeal and quality models, here none of the models showed a high prediction performance.
arXiv Detail & Related papers (2025-02-19T13:45:24Z)
Re-assessing ImageNet: How aligned is its single-label assumption with its multi-label nature? [1.4828022319975973]
We analyze the effectiveness of pre-trained state-of-the-art deep neural network (DNN) models on ImageNet and one of its variants, ImageNetV2. Our findings show that these reported declines are largely attributable to a characteristic of the dataset that has not received sufficient attention. Our findings highlight the importance of considering the multi-label nature of the ImageNet dataset during benchmarking.
arXiv Detail & Related papers (2024-12-24T12:55:31Z)
A Comparative Analysis of CNN-based Deep Learning Models for Landslide Detection [0.0]
Landslides in northern parts of India and Nepal have caused significant disruption, damaging infrastructure and posing threats to local communities. Recent landslides in northern parts of India and Nepal have caused significant disruption, damaging infrastructure and posing threats to local communities. CNNs, a type of deep learning technique, have shown remarkable success in image processing.
arXiv Detail & Related papers (2024-08-03T07:20:10Z)
ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object [78.58860252442045]
We introduce generative model as a data source for hard images that benchmark deep models' robustness. We are able to generate images with more diversified backgrounds, textures, and materials than any prior work, where we term this benchmark as ImageNet-D. Our work suggests that diffusion models can be an effective source to test vision models.
arXiv Detail & Related papers (2024-03-27T17:23:39Z)
ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing [45.14977000707886]
Higher accuracy on ImageNet usually leads to better robustness against different corruptions. We create a toolkit for object editing with controls of backgrounds, sizes, positions, and directions. We evaluate the performance of current deep learning models, including both convolutional neural networks and vision transformers.
arXiv Detail & Related papers (2023-03-30T02:02:32Z)
Effective Robustness against Natural Distribution Shifts for Models with Different Training Data [113.21868839569]
"Effective robustness" measures the extra out-of-distribution robustness beyond what can be predicted from the in-distribution (ID) performance. We propose a new evaluation metric to evaluate and compare the effective robustness of models trained on different data.
arXiv Detail & Related papers (2023-02-02T19:28:41Z)
MDN-VO: Estimating Visual Odometry with Confidence [34.8860186009308]
Visual Odometry (VO) is used in many applications including robotics and autonomous systems. We propose a deep learning-based VO model to estimate 6-DoF poses, as well as a confidence model for these estimates. Our experiments show that the proposed model exceeds state-of-the-art performance in addition to detecting failure cases.
arXiv Detail & Related papers (2021-12-23T19:26:04Z)
MEMO: Test Time Robustness via Adaptation and Augmentation [131.28104376280197]
We study the problem of test time robustification, i.e., using the test input to improve model robustness. Recent prior works have proposed methods for test time adaptation, however, they each introduce additional assumptions. We propose a simple approach that can be used in any test setting where the model is probabilistic and adaptable.
arXiv Detail & Related papers (2021-10-18T17:55:11Z)
Probabilistic Modeling for Human Mesh Recovery [73.11532990173441]
This paper focuses on the problem of 3D human reconstruction from 2D evidence. We recast the problem as learning a mapping from the input to a distribution of plausible 3D poses.
arXiv Detail & Related papers (2021-08-26T17:55:11Z)
Contemplating real-world object classification [53.10151901863263]
We reanalyze the ObjectNet dataset recently proposed by Barbu et al. containing objects in daily life situations. We find that applying deep models to the isolated objects, rather than the entire scene as is done in the original paper, results in around 20-30% performance improvement.
arXiv Detail & Related papers (2021-03-08T23:29:59Z)
Deep Bingham Networks: Dealing with Uncertainty and Ambiguity in Pose Estimation [74.76155168705975]
Deep Bingham Networks (DBN) can handle pose-related uncertainties and ambiguities arising in almost all real life applications concerning 3D data. DBN extends the state of the art direct pose regression networks by (i) a multi-hypotheses prediction head which can yield different distribution modes. We propose new training strategies so as to avoid mode or posterior collapse during training and to improve numerical stability.
arXiv Detail & Related papers (2020-12-20T19:20:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.