Understanding Robust Learning through the Lens of Representation
Similarities
- URL: http://arxiv.org/abs/2206.09868v1
- Date: Mon, 20 Jun 2022 16:06:20 GMT
- Title: Understanding Robust Learning through the Lens of Representation
Similarities
- Authors: Christian Cianfarani, Arjun Nitin Bhagoji, Vikash Sehwag, Ben Zhao,
Prateek Mittal
- Abstract summary: robustness to adversarial examples has emerged as a desirable property for deep neural networks (DNNs)
In this paper, we aim to understand how the properties of representations learned by robust training differ from those obtained from standard, non-robust training.
- Score: 37.66877172364004
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Representation learning, i.e. the generation of representations useful for
downstream applications, is a task of fundamental importance that underlies
much of the success of deep neural networks (DNNs). Recently, robustness to
adversarial examples has emerged as a desirable property for DNNs, spurring the
development of robust training methods that account for adversarial examples.
In this paper, we aim to understand how the properties of representations
learned by robust training differ from those obtained from standard, non-robust
training. This is critical to diagnosing numerous salient pitfalls in robust
networks, such as, degradation of performance on benign inputs, poor
generalization of robustness, and increase in over-fitting. We utilize a
powerful set of tools known as representation similarity metrics, across three
vision datasets, to obtain layer-wise comparisons between robust and non-robust
DNNs with different architectures, training procedures and adversarial
constraints. Our experiments highlight hitherto unseen properties of robust
representations that we posit underlie the behavioral differences of robust
networks. We discover a lack of specialization in robust networks'
representations along with a disappearance of `block structure'. We also find
overfitting during robust training largely impacts deeper layers. These, along
with other findings, suggest ways forward for the design and training of better
robust networks.
Related papers
- Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data [38.44734564565478]
We provide a theoretical understanding of adversarial examples and adversarial training algorithms from the perspective of feature learning theory.
We show that the adversarial training method can provably strengthen the robust feature learning and suppress the non-robust feature learning.
arXiv Detail & Related papers (2024-10-11T03:59:49Z) - MOREL: Enhancing Adversarial Robustness through Multi-Objective Representation Learning [1.534667887016089]
deep neural networks (DNNs) are vulnerable to slight adversarial perturbations.
We show that strong feature representation learning during training can significantly enhance the original model's robustness.
We propose MOREL, a multi-objective feature representation learning approach, encouraging classification models to produce similar features for inputs within the same class, despite perturbations.
arXiv Detail & Related papers (2024-10-02T16:05:03Z) - Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance.
Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning.
Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z) - Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework.
Our importance weights are obtained by optimizing the KL-divergence regularized loss function.
Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z) - A Theoretical Perspective on Subnetwork Contributions to Adversarial
Robustness [2.064612766965483]
This paper investigates how the adversarial robustness of a subnetwork contributes to the robustness of the entire network.
Experiments show the ability of a robust subnetwork to promote full-network robustness, and investigate the layer-wise dependencies required for this full-network robustness to be achieved.
arXiv Detail & Related papers (2023-07-07T19:16:59Z) - A Comprehensive Study on Robustness of Image Classification Models:
Benchmarking and Rethinking [54.89987482509155]
robustness of deep neural networks is usually lacking under adversarial examples, common corruptions, and distribution shifts.
We establish a comprehensive benchmark robustness called textbfARES-Bench on the image classification task.
By designing the training settings accordingly, we achieve the new state-of-the-art adversarial robustness.
arXiv Detail & Related papers (2023-02-28T04:26:20Z) - Rethinking Robust Contrastive Learning from the Adversarial Perspective [2.3333090554192615]
We find significant disparities between adversarial and clean representations in standard-trained networks.
adversarial training mitigates these disparities and fosters the convergence of representations toward a universal set.
arXiv Detail & Related papers (2023-02-05T22:43:50Z) - Exploring Architectural Ingredients of Adversarially Robust Deep Neural
Networks [98.21130211336964]
Deep neural networks (DNNs) are known to be vulnerable to adversarial attacks.
In this paper, we investigate the impact of network width and depth on the robustness of adversarially trained DNNs.
arXiv Detail & Related papers (2021-10-07T23:13:33Z) - Rethinking Clustering for Robustness [56.14672993686335]
ClusTR is a clustering-based and adversary-free training framework to learn robust models.
textitClusTR outperforms adversarially-trained networks by up to $4%$ under strong PGD attacks.
arXiv Detail & Related papers (2020-06-13T16:55:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.