A Comprehensive Study of Image Classification Model Sensitivity to
Foregrounds, Backgrounds, and Visual Attributes
- URL: http://arxiv.org/abs/2201.10766v1
- Date: Wed, 26 Jan 2022 06:31:28 GMT
- Title: A Comprehensive Study of Image Classification Model Sensitivity to
Foregrounds, Backgrounds, and Visual Attributes
- Authors: Mazda Moayeri, Phillip Pope, Yogesh Balaji, Soheil Feizi
- Abstract summary: We call this dataset RIVAL10 consisting of roughly $26k$ instances over $10$ classes.
We evaluate the sensitivity of a broad set of models to noise corruptions in foregrounds, backgrounds and attributes.
In our analysis, we consider diverse state-of-the-art architectures (ResNets, Transformers) and training procedures (CLIP, SimCLR, DeiT, Adversarial Training)
- Score: 58.633364000258645
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While datasets with single-label supervision have propelled rapid advances in
image classification, additional annotations are necessary in order to
quantitatively assess how models make predictions. To this end, for a subset of
ImageNet samples, we collect segmentation masks for the entire object and $18$
informative attributes. We call this dataset RIVAL10 (RIch Visual Attributes
with Localization), consisting of roughly $26k$ instances over $10$ classes.
Using RIVAL10, we evaluate the sensitivity of a broad set of models to noise
corruptions in foregrounds, backgrounds and attributes. In our analysis, we
consider diverse state-of-the-art architectures (ResNets, Transformers) and
training procedures (CLIP, SimCLR, DeiT, Adversarial Training). We find that,
somewhat surprisingly, in ResNets, adversarial training makes models more
sensitive to the background compared to foreground than standard training.
Similarly, contrastively-trained models also have lower relative foreground
sensitivity in both transformers and ResNets. Lastly, we observe intriguing
adaptive abilities of transformers to increase relative foreground sensitivity
as corruption level increases. Using saliency methods, we automatically
discover spurious features that drive the background sensitivity of models and
assess alignment of saliency maps with foregrounds. Finally, we quantitatively
study the attribution problem for neural features by comparing feature saliency
with ground-truth localization of semantic attributes.
Related papers
- Sensitivity-Informed Augmentation for Robust Segmentation [21.609070498399863]
Internal noises such as variations in camera quality or lens distortion can affect the performance of segmentation models.
We present an efficient, adaptable, and gradient-free method to enhance the robustness of learning-based segmentation models across training.
arXiv Detail & Related papers (2024-06-03T15:25:45Z) - Visual Context-Aware Person Fall Detection [52.49277799455569]
We present a segmentation pipeline to semi-automatically separate individuals and objects in images.
Background objects such as beds, chairs, or wheelchairs can challenge fall detection systems, leading to false positive alarms.
We demonstrate that object-specific contextual transformations during training effectively mitigate this challenge.
arXiv Detail & Related papers (2024-04-11T19:06:36Z) - ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing [45.14977000707886]
Higher accuracy on ImageNet usually leads to better robustness against different corruptions.
We create a toolkit for object editing with controls of backgrounds, sizes, positions, and directions.
We evaluate the performance of current deep learning models, including both convolutional neural networks and vision transformers.
arXiv Detail & Related papers (2023-03-30T02:02:32Z) - Self-Supervised In-Domain Representation Learning for Remote Sensing
Image Scene Classification [1.0152838128195465]
Transferring the ImageNet pre-trained weights to the various remote sensing tasks has produced acceptable results.
Recent research has demonstrated that self-supervised learning methods capture visual features that are more discriminative and transferable.
We are motivated by these facts to pre-train the in-domain representations of remote sensing imagery using contrastive self-supervised learning.
arXiv Detail & Related papers (2023-02-03T15:03:07Z) - Robustifying Deep Vision Models Through Shape Sensitization [19.118696557797957]
We propose a simple, lightweight adversarial augmentation technique that explicitly incentivizes the network to learn holistic shapes.
Our augmentations superpose edgemaps from one image onto another image with shuffled patches, using a randomly determined mixing proportion.
We show that our augmentations significantly improve classification accuracy and robustness measures on a range of datasets and neural architectures.
arXiv Detail & Related papers (2022-11-14T11:17:46Z) - Core Risk Minimization using Salient ImageNet [53.616101711801484]
We introduce the Salient Imagenet dataset with more than 1 million soft masks localizing core and spurious features for all 1000 Imagenet classes.
Using this dataset, we first evaluate the reliance of several Imagenet pretrained models (42 total) on spurious features.
Next, we introduce a new learning paradigm called Core Risk Minimization (CoRM) whose objective ensures that the model predicts a class using its core features.
arXiv Detail & Related papers (2022-03-28T01:53:34Z) - Image Quality Assessment using Contrastive Learning [50.265638572116984]
We train a deep Convolutional Neural Network (CNN) using a contrastive pairwise objective to solve the auxiliary problem.
We show through extensive experiments that CONTRIQUE achieves competitive performance when compared to state-of-the-art NR image quality models.
Our results suggest that powerful quality representations with perceptual relevance can be obtained without requiring large labeled subjective image quality datasets.
arXiv Detail & Related papers (2021-10-25T21:01:00Z) - Self-supervised Audiovisual Representation Learning for Remote Sensing Data [96.23611272637943]
We propose a self-supervised approach for pre-training deep neural networks in remote sensing.
By exploiting the correspondence between geo-tagged audio recordings and remote sensing, this is done in a completely label-free manner.
We show that our approach outperforms existing pre-training strategies for remote sensing imagery.
arXiv Detail & Related papers (2021-08-02T07:50:50Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.