ViewFool: Evaluating the Robustness of Visual Recognition to Adversarial
Viewpoints
- URL: http://arxiv.org/abs/2210.03895v1
- Date: Sat, 8 Oct 2022 03:06:49 GMT
- Title: ViewFool: Evaluating the Robustness of Visual Recognition to Adversarial
Viewpoints
- Authors: Yinpeng Dong, Shouwei Ruan, Hang Su, Caixin Kang, Xingxing Wei, Jun
Zhu
- Abstract summary: We propose a novel method called ViewFool to find adversarial viewpoints that mislead visual recognition models.
By encoding real-world objects as neural radiance fields (NeRF), ViewFool characterizes a distribution of diverse adversarial viewpoints.
- Score: 42.64942578228025
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent studies have demonstrated that visual recognition models lack
robustness to distribution shift. However, current work mainly considers model
robustness to 2D image transformations, leaving viewpoint changes in the 3D
world less explored. In general, viewpoint changes are prevalent in various
real-world applications (e.g., autonomous driving), making it imperative to
evaluate viewpoint robustness. In this paper, we propose a novel method called
ViewFool to find adversarial viewpoints that mislead visual recognition models.
By encoding real-world objects as neural radiance fields (NeRF), ViewFool
characterizes a distribution of diverse adversarial viewpoints under an
entropic regularizer, which helps to handle the fluctuations of the real camera
pose and mitigate the reality gap between the real objects and their neural
representations. Experiments validate that the common image classifiers are
extremely vulnerable to the generated adversarial viewpoints, which also
exhibit high cross-model transferability. Based on ViewFool, we introduce
ImageNet-V, a new out-of-distribution dataset for benchmarking viewpoint
robustness of image classifiers. Evaluation results on 40 classifiers with
diverse architectures, objective functions, and data augmentations reveal a
significant drop in model performance when tested on ImageNet-V, which provides
a possibility to leverage ViewFool as an effective data augmentation strategy
to improve viewpoint robustness.
Related papers
- Freeview Sketching: View-Aware Fine-Grained Sketch-Based Image Retrieval [85.73149096516543]
We address the choice of viewpoint during sketch creation in Fine-Grained Sketch-Based Image Retrieval (FG-SBIR)
A pilot study highlights the system's struggle when query-sketches differ in viewpoint from target instances.
To reconcile this, we advocate for a view-aware system, seamlessly accommodating both view-agnostic and view-specific tasks.
arXiv Detail & Related papers (2024-07-01T21:20:44Z) - Neural Clustering based Visual Representation Learning [61.72646814537163]
Clustering is one of the most classic approaches in machine learning and data analysis.
We propose feature extraction with clustering (FEC), which views feature extraction as a process of selecting representatives from data.
FEC alternates between grouping pixels into individual clusters to abstract representatives and updating the deep features of pixels with current representatives.
arXiv Detail & Related papers (2024-03-26T06:04:50Z) - Möbius Transform for Mitigating Perspective Distortions in Representation Learning [43.86985901138407]
Perspective distortion (PD) causes unprecedented changes in shape, size, orientation, angles, and other spatial relationships in images.
We propose mitigating perspective distortion (MPD) by employing a fine-grained parameter control on a specific family of M"obius transform.
We present a dedicated perspectively distorted benchmark dataset, ImageNet-PD, to benchmark the robustness of deep learning models against this new dataset.
arXiv Detail & Related papers (2024-03-07T15:39:00Z) - DiG-IN: Diffusion Guidance for Investigating Networks -- Uncovering Classifier Differences Neuron Visualisations and Visual Counterfactual Explanations [35.458709912618176]
Deep learning has led to huge progress in complex image classification tasks like ImageNet, unexpected failure modes, e.g. via spurious features.
For safety-critical tasks the black-box nature of their decisions is problematic, and explanations or at least methods which make decisions plausible are needed urgently.
We address these problems by generating images that optimize a classifier-derived objective using a framework for guided image generation.
arXiv Detail & Related papers (2023-11-29T17:35:29Z) - Improving Viewpoint Robustness for Visual Recognition via Adversarial
Training [26.824940629150362]
We propose Viewpoint-Invariant Adversarial Training (VIAT) to improve the viewpoint robustness of image classifiers.
We show that VIAT significantly improves the viewpoint robustness of various image classifiers based on the diversity of adversarial viewpoints generated by GMVFool.
arXiv Detail & Related papers (2023-07-21T12:18:35Z) - Towards Viewpoint-Invariant Visual Recognition via Adversarial Training [28.424131496622497]
We propose Viewpoint-Invariant Adrial Training (VIAT) to improve viewpoint robustness of common image classifiers.
VIAT is formulated as a minimax optimization problem, where the inner recognition characterizes diverse adversarial viewpoints.
To further improve the generalization performance, a distribution sharing strategy is introduced.
arXiv Detail & Related papers (2023-07-16T07:55:42Z) - Sparse Visual Counterfactual Explanations in Image Space [50.768119964318494]
We present a novel model for visual counterfactual explanations in image space.
We show that it can be used to detect undesired behavior of ImageNet classifiers due to spurious features in the ImageNet dataset.
arXiv Detail & Related papers (2022-05-16T20:23:11Z) - ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for
Image Recognition and Beyond [76.35955924137986]
We propose a Vision Transformer Advanced by Exploring intrinsic IB from convolutions, i.e., ViTAE.
ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context.
We obtain the state-of-the-art classification performance, i.e., 88.5% Top-1 classification accuracy on ImageNet validation set and the best 91.2% Top-1 accuracy on ImageNet real validation set.
arXiv Detail & Related papers (2022-02-21T10:40:05Z) - Unsupervised View-Invariant Human Posture Representation [28.840986167408037]
We present a novel unsupervised approach that learns to extract view-invariant 3D human pose representation from a 2D image.
Our model is trained by exploiting the intrinsic view-invariant properties of human pose between simultaneous frames.
We show improvements on the state-of-the-art unsupervised cross-view action classification accuracy on RGB and depth images.
arXiv Detail & Related papers (2021-09-17T19:23:31Z) - Contemplating real-world object classification [53.10151901863263]
We reanalyze the ObjectNet dataset recently proposed by Barbu et al. containing objects in daily life situations.
We find that applying deep models to the isolated objects, rather than the entire scene as is done in the original paper, results in around 20-30% performance improvement.
arXiv Detail & Related papers (2021-03-08T23:29:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.