Improving Viewpoint Robustness for Visual Recognition via Adversarial
Training
- URL: http://arxiv.org/abs/2307.11528v1
- Date: Fri, 21 Jul 2023 12:18:35 GMT
- Title: Improving Viewpoint Robustness for Visual Recognition via Adversarial
Training
- Authors: Shouwei Ruan, Yinpeng Dong, Hang Su, Jianteng Peng, Ning Chen, and
Xingxing Wei
- Abstract summary: We propose Viewpoint-Invariant Adversarial Training (VIAT) to improve the viewpoint robustness of image classifiers.
We show that VIAT significantly improves the viewpoint robustness of various image classifiers based on the diversity of adversarial viewpoints generated by GMVFool.
- Score: 26.824940629150362
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Viewpoint invariance remains challenging for visual recognition in the 3D
world, as altering the viewing directions can significantly impact predictions
for the same object. While substantial efforts have been dedicated to making
neural networks invariant to 2D image translations and rotations, viewpoint
invariance is rarely investigated. Motivated by the success of adversarial
training in enhancing model robustness, we propose Viewpoint-Invariant
Adversarial Training (VIAT) to improve the viewpoint robustness of image
classifiers. Regarding viewpoint transformation as an attack, we formulate VIAT
as a minimax optimization problem, where the inner maximization characterizes
diverse adversarial viewpoints by learning a Gaussian mixture distribution
based on the proposed attack method GMVFool. The outer minimization obtains a
viewpoint-invariant classifier by minimizing the expected loss over the
worst-case viewpoint distributions that can share the same one for different
objects within the same category. Based on GMVFool, we contribute a large-scale
dataset called ImageNet-V+ to benchmark viewpoint robustness. Experimental
results show that VIAT significantly improves the viewpoint robustness of
various image classifiers based on the diversity of adversarial viewpoints
generated by GMVFool. Furthermore, we propose ViewRS, a certified viewpoint
robustness method that provides a certified radius and accuracy to demonstrate
the effectiveness of VIAT from the theoretical perspective.
Related papers
- Appearance Debiased Gaze Estimation via Stochastic Subject-Wise
Adversarial Learning [33.55397868171977]
Appearance-based gaze estimation has been attracting attention in computer vision, and remarkable improvements have been achieved using various deep learning techniques.
We propose a novel framework: subject-wise gaZE learning (SAZE), which trains a network to generalize the appearance of subjects.
Our experimental results verify the robustness of the method in that it yields state-of-the-art performance, achieving 3.89 and 4.42 on the MPIIGaze and EyeDiap datasets, respectively.
arXiv Detail & Related papers (2024-01-25T00:23:21Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - DealMVC: Dual Contrastive Calibration for Multi-view Clustering [78.54355167448614]
We propose a novel Dual contrastive calibration network for Multi-View Clustering (DealMVC)
We first design a fusion mechanism to obtain a global cross-view feature. Then, a global contrastive calibration loss is proposed by aligning the view feature similarity graph and the high-confidence pseudo-label graph.
During the training procedure, the interacted cross-view feature is jointly optimized at both local and global levels.
arXiv Detail & Related papers (2023-08-17T14:14:28Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Towards Viewpoint-Invariant Visual Recognition via Adversarial Training [28.424131496622497]
We propose Viewpoint-Invariant Adrial Training (VIAT) to improve viewpoint robustness of common image classifiers.
VIAT is formulated as a minimax optimization problem, where the inner recognition characterizes diverse adversarial viewpoints.
To further improve the generalization performance, a distribution sharing strategy is introduced.
arXiv Detail & Related papers (2023-07-16T07:55:42Z) - VIBR: Learning View-Invariant Value Functions for Robust Visual Control [3.2307366446033945]
VIBR (View-Invariant Bellman Residuals) is a method that combines multi-view training and invariant prediction to reduce out-of-distribution gap for RL based visuomotor control.
We show that VIBR outperforms existing methods on complex visuo-motor control environment with high visual perturbation.
arXiv Detail & Related papers (2023-06-14T14:37:34Z) - ViewFool: Evaluating the Robustness of Visual Recognition to Adversarial
Viewpoints [42.64942578228025]
We propose a novel method called ViewFool to find adversarial viewpoints that mislead visual recognition models.
By encoding real-world objects as neural radiance fields (NeRF), ViewFool characterizes a distribution of diverse adversarial viewpoints.
arXiv Detail & Related papers (2022-10-08T03:06:49Z) - Unsupervised View-Invariant Human Posture Representation [28.840986167408037]
We present a novel unsupervised approach that learns to extract view-invariant 3D human pose representation from a 2D image.
Our model is trained by exploiting the intrinsic view-invariant properties of human pose between simultaneous frames.
We show improvements on the state-of-the-art unsupervised cross-view action classification accuracy on RGB and depth images.
arXiv Detail & Related papers (2021-09-17T19:23:31Z) - Deep Semantic Matching with Foreground Detection and Cycle-Consistency [103.22976097225457]
We address weakly supervised semantic matching based on a deep network.
We explicitly estimate the foreground regions to suppress the effect of background clutter.
We develop cycle-consistent losses to enforce the predicted transformations across multiple images to be geometrically plausible and consistent.
arXiv Detail & Related papers (2020-03-31T22:38:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.