Uncertainty in AI: Evaluating Deep Neural Networks on
  Out-of-Distribution Images
        - URL: http://arxiv.org/abs/2309.01850v1
- Date: Mon, 4 Sep 2023 22:46:59 GMT
- Title: Uncertainty in AI: Evaluating Deep Neural Networks on
  Out-of-Distribution Images
- Authors: Jamiu Idowu and Ahmed Almasoud
- Abstract summary: This paper investigates the uncertainty of various deep neural networks, including ResNet-50, VGG16, DenseNet121, AlexNet, and GoogleNet, when dealing with perturbed data.
While ResNet-50 was the most accurate single model for OOD images, the ensemble performed even better, correctly classifying all images.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   As AI models are increasingly deployed in critical applications, ensuring the
consistent performance of models when exposed to unusual situations such as
out-of-distribution (OOD) or perturbed data, is important. Therefore, this
paper investigates the uncertainty of various deep neural networks, including
ResNet-50, VGG16, DenseNet121, AlexNet, and GoogleNet, when dealing with such
data. Our approach includes three experiments. First, we used the pretrained
models to classify OOD images generated via DALL-E to assess their performance.
Second, we built an ensemble from the models' predictions using probabilistic
averaging for consensus due to its advantages over plurality or majority
voting. The ensemble's uncertainty was quantified using average probabilities,
variance, and entropy metrics. Our results showed that while ResNet-50 was the
most accurate single model for OOD images, the ensemble performed even better,
correctly classifying all images. Third, we tested model robustness by adding
perturbations (filters, rotations, etc.) to new epistemic images from DALL-E or
real-world captures. ResNet-50 was chosen for this being the best performing
model. While it classified 4 out of 5 unperturbed images correctly, it
misclassified all of them post-perturbation, indicating a significant
vulnerability. These misclassifications, which are clear to human observers,
highlight AI models' limitations. Using saliency maps, we identified regions of
the images that the model considered important for their decisions.
 
      
        Related papers
        - LAION-C: An Out-of-Distribution Benchmark for Web-Scale Vision Models [19.56756019309533]
 We introduce LAION-C as a benchmark alternative for ImageNet-C.<n>In a comprehensive evaluation of state-of-the-art models, we find that the LAION-C dataset poses significant challenges to contemporary models.<n>We observe a paradigm shift in OOD generalization: from humans outperforming models, to the best models now matching or outperforming the best human observers.
 arXiv  Detail & Related papers  (2025-06-20T12:32:27Z)
- Appeal prediction for AI up-scaled Images [45.61706071739717]
 We describe our developed dataset, which uses 136 base images and five different up-scaling methods.
We evaluate the appeal of the different methods, and the results indicate that Real-ESRGAN and BSRGAN are the best.
In addition to this, we evaluate state-of-the-art image appeal and quality models, here none of the models showed a high prediction performance.
 arXiv  Detail & Related papers  (2025-02-19T13:45:24Z)
- Re-assessing ImageNet: How aligned is its single-label assumption with   its multi-label nature? [1.4828022319975973]
 We analyze the effectiveness of pre-trained state-of-the-art deep neural network (DNN) models on ImageNet and one of its variants, ImageNetV2.
Our findings show that these reported declines are largely attributable to a characteristic of the dataset that has not received sufficient attention.
Our findings highlight the importance of considering the multi-label nature of the ImageNet dataset during benchmarking.
 arXiv  Detail & Related papers  (2024-12-24T12:55:31Z)
- A Comparative Analysis of CNN-based Deep Learning Models for Landslide   Detection [0.0]
 Landslides in northern parts of India and Nepal have caused significant disruption, damaging infrastructure and posing threats to local communities.
Recent landslides in northern parts of India and Nepal have caused significant disruption, damaging infrastructure and posing threats to local communities.
CNNs, a type of deep learning technique, have shown remarkable success in image processing.
 arXiv  Detail & Related papers  (2024-08-03T07:20:10Z)
- ImageNet-D: Benchmarking Neural Network Robustness on Diffusion   Synthetic Object [78.58860252442045]
 We introduce generative model as a data source for hard images that benchmark deep models' robustness.
We are able to generate images with more diversified backgrounds, textures, and materials than any prior work, where we term this benchmark as ImageNet-D.
Our work suggests that diffusion models can be an effective source to test vision models.
 arXiv  Detail & Related papers  (2024-03-27T17:23:39Z)
- ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing [45.14977000707886]
 Higher accuracy on ImageNet usually leads to better robustness against different corruptions.
We create a toolkit for object editing with controls of backgrounds, sizes, positions, and directions.
We evaluate the performance of current deep learning models, including both convolutional neural networks and vision transformers.
 arXiv  Detail & Related papers  (2023-03-30T02:02:32Z)
- Effective Robustness against Natural Distribution Shifts for Models with
  Different Training Data [113.21868839569]
 "Effective robustness" measures the extra out-of-distribution robustness beyond what can be predicted from the in-distribution (ID) performance.
We propose a new evaluation metric to evaluate and compare the effective robustness of models trained on different data.
 arXiv  Detail & Related papers  (2023-02-02T19:28:41Z)
- MDN-VO: Estimating Visual Odometry with Confidence [34.8860186009308]
 Visual Odometry (VO) is used in many applications including robotics and autonomous systems.
We propose a deep learning-based VO model to estimate 6-DoF poses, as well as a confidence model for these estimates.
Our experiments show that the proposed model exceeds state-of-the-art performance in addition to detecting failure cases.
 arXiv  Detail & Related papers  (2021-12-23T19:26:04Z)
- MEMO: Test Time Robustness via Adaptation and Augmentation [131.28104376280197]
 We study the problem of test time robustification, i.e., using the test input to improve model robustness.
Recent prior works have proposed methods for test time adaptation, however, they each introduce additional assumptions.
We propose a simple approach that can be used in any test setting where the model is probabilistic and adaptable.
 arXiv  Detail & Related papers  (2021-10-18T17:55:11Z)
- Probabilistic Modeling for Human Mesh Recovery [73.11532990173441]
 This paper focuses on the problem of 3D human reconstruction from 2D evidence.
We recast the problem as learning a mapping from the input to a distribution of plausible 3D poses.
 arXiv  Detail & Related papers  (2021-08-26T17:55:11Z)
- Contemplating real-world object classification [53.10151901863263]
 We reanalyze the ObjectNet dataset recently proposed by Barbu et al. containing objects in daily life situations.
We find that applying deep models to the isolated objects, rather than the entire scene as is done in the original paper, results in around 20-30% performance improvement.
 arXiv  Detail & Related papers  (2021-03-08T23:29:59Z)
- Deep Bingham Networks: Dealing with Uncertainty and Ambiguity in Pose
  Estimation [74.76155168705975]
 Deep Bingham Networks (DBN) can handle pose-related uncertainties and ambiguities arising in almost all real life applications concerning 3D data.
DBN extends the state of the art direct pose regression networks by (i) a multi-hypotheses prediction head which can yield different distribution modes.
We propose new training strategies so as to avoid mode or posterior collapse during training and to improve numerical stability.
 arXiv  Detail & Related papers  (2020-12-20T19:20:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.