Uncertainty in AI: Evaluating Deep Neural Networks on
Out-of-Distribution Images
- URL: http://arxiv.org/abs/2309.01850v1
- Date: Mon, 4 Sep 2023 22:46:59 GMT
- Title: Uncertainty in AI: Evaluating Deep Neural Networks on
Out-of-Distribution Images
- Authors: Jamiu Idowu and Ahmed Almasoud
- Abstract summary: This paper investigates the uncertainty of various deep neural networks, including ResNet-50, VGG16, DenseNet121, AlexNet, and GoogleNet, when dealing with perturbed data.
While ResNet-50 was the most accurate single model for OOD images, the ensemble performed even better, correctly classifying all images.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As AI models are increasingly deployed in critical applications, ensuring the
consistent performance of models when exposed to unusual situations such as
out-of-distribution (OOD) or perturbed data, is important. Therefore, this
paper investigates the uncertainty of various deep neural networks, including
ResNet-50, VGG16, DenseNet121, AlexNet, and GoogleNet, when dealing with such
data. Our approach includes three experiments. First, we used the pretrained
models to classify OOD images generated via DALL-E to assess their performance.
Second, we built an ensemble from the models' predictions using probabilistic
averaging for consensus due to its advantages over plurality or majority
voting. The ensemble's uncertainty was quantified using average probabilities,
variance, and entropy metrics. Our results showed that while ResNet-50 was the
most accurate single model for OOD images, the ensemble performed even better,
correctly classifying all images. Third, we tested model robustness by adding
perturbations (filters, rotations, etc.) to new epistemic images from DALL-E or
real-world captures. ResNet-50 was chosen for this being the best performing
model. While it classified 4 out of 5 unperturbed images correctly, it
misclassified all of them post-perturbation, indicating a significant
vulnerability. These misclassifications, which are clear to human observers,
highlight AI models' limitations. Using saliency maps, we identified regions of
the images that the model considered important for their decisions.
Related papers
- A Comparative Analysis of CNN-based Deep Learning Models for Landslide Detection [0.0]
Landslides in northern parts of India and Nepal have caused significant disruption, damaging infrastructure and posing threats to local communities.
Recent landslides in northern parts of India and Nepal have caused significant disruption, damaging infrastructure and posing threats to local communities.
CNNs, a type of deep learning technique, have shown remarkable success in image processing.
arXiv Detail & Related papers (2024-08-03T07:20:10Z) - ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object [78.58860252442045]
We introduce generative model as a data source for hard images that benchmark deep models' robustness.
We are able to generate images with more diversified backgrounds, textures, and materials than any prior work, where we term this benchmark as ImageNet-D.
Our work suggests that diffusion models can be an effective source to test vision models.
arXiv Detail & Related papers (2024-03-27T17:23:39Z) - ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing [45.14977000707886]
Higher accuracy on ImageNet usually leads to better robustness against different corruptions.
We create a toolkit for object editing with controls of backgrounds, sizes, positions, and directions.
We evaluate the performance of current deep learning models, including both convolutional neural networks and vision transformers.
arXiv Detail & Related papers (2023-03-30T02:02:32Z) - Effective Robustness against Natural Distribution Shifts for Models with
Different Training Data [113.21868839569]
"Effective robustness" measures the extra out-of-distribution robustness beyond what can be predicted from the in-distribution (ID) performance.
We propose a new evaluation metric to evaluate and compare the effective robustness of models trained on different data.
arXiv Detail & Related papers (2023-02-02T19:28:41Z) - MDN-VO: Estimating Visual Odometry with Confidence [34.8860186009308]
Visual Odometry (VO) is used in many applications including robotics and autonomous systems.
We propose a deep learning-based VO model to estimate 6-DoF poses, as well as a confidence model for these estimates.
Our experiments show that the proposed model exceeds state-of-the-art performance in addition to detecting failure cases.
arXiv Detail & Related papers (2021-12-23T19:26:04Z) - MEMO: Test Time Robustness via Adaptation and Augmentation [131.28104376280197]
We study the problem of test time robustification, i.e., using the test input to improve model robustness.
Recent prior works have proposed methods for test time adaptation, however, they each introduce additional assumptions.
We propose a simple approach that can be used in any test setting where the model is probabilistic and adaptable.
arXiv Detail & Related papers (2021-10-18T17:55:11Z) - Probabilistic Modeling for Human Mesh Recovery [73.11532990173441]
This paper focuses on the problem of 3D human reconstruction from 2D evidence.
We recast the problem as learning a mapping from the input to a distribution of plausible 3D poses.
arXiv Detail & Related papers (2021-08-26T17:55:11Z) - Contemplating real-world object classification [53.10151901863263]
We reanalyze the ObjectNet dataset recently proposed by Barbu et al. containing objects in daily life situations.
We find that applying deep models to the isolated objects, rather than the entire scene as is done in the original paper, results in around 20-30% performance improvement.
arXiv Detail & Related papers (2021-03-08T23:29:59Z) - Deep Bingham Networks: Dealing with Uncertainty and Ambiguity in Pose
Estimation [74.76155168705975]
Deep Bingham Networks (DBN) can handle pose-related uncertainties and ambiguities arising in almost all real life applications concerning 3D data.
DBN extends the state of the art direct pose regression networks by (i) a multi-hypotheses prediction head which can yield different distribution modes.
We propose new training strategies so as to avoid mode or posterior collapse during training and to improve numerical stability.
arXiv Detail & Related papers (2020-12-20T19:20:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.