Clustering Effect of (Linearized) Adversarial Robust Models
- URL: http://arxiv.org/abs/2111.12922v1
- Date: Thu, 25 Nov 2021 05:51:03 GMT
- Title: Clustering Effect of (Linearized) Adversarial Robust Models
- Authors: Yang Bai, Xin Yan, Yong Jiang, Shu-Tao Xia, Yisen Wang
- Abstract summary: We propose a novel understanding of adversarial robustness and apply it on more tasks including domain adaption and robustness boosting.
Experimental evaluations demonstrate the rationality and superiority of our proposed clustering strategy.
- Score: 60.25668525218051
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial robustness has received increasing attention along with the study
of adversarial examples. So far, existing works show that robust models not
only obtain robustness against various adversarial attacks but also boost the
performance in some downstream tasks. However, the underlying mechanism of
adversarial robustness is still not clear. In this paper, we interpret
adversarial robustness from the perspective of linear components, and find that
there exist some statistical properties for comprehensively robust models.
Specifically, robust models show obvious hierarchical clustering effect on
their linearized sub-networks, when removing or replacing all non-linear
components (e.g., batch normalization, maximum pooling, or activation layers).
Based on these observations, we propose a novel understanding of adversarial
robustness and apply it on more tasks including domain adaption and robustness
boosting. Experimental evaluations demonstrate the rationality and superiority
of our proposed clustering strategy.
Related papers
- Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance.
Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning.
Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z) - Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study [61.65123150513683]
multimodal foundation models, such as CLIP, produce state-of-the-art zero-shot results.
It is reported that these models close the robustness gap by matching the performance of supervised models trained on ImageNet.
We show that CLIP leads to a significant robustness drop compared to supervised ImageNet models on our benchmark.
arXiv Detail & Related papers (2024-03-15T17:33:49Z) - Sparse and Transferable Universal Singular Vectors Attack [5.498495800909073]
We propose a novel sparse universal white-box adversarial attack.
Our approach is based on truncated power providing sparsity to $(p,q)$-singular vectors of the hidden layers of Jacobian matrices.
Our findings demonstrate the vulnerability of state-of-the-art models to sparse attacks and highlight the importance of developing robust machine learning systems.
arXiv Detail & Related papers (2024-01-25T09:21:29Z) - Doubly Robust Instance-Reweighted Adversarial Training [107.40683655362285]
We propose a novel doubly-robust instance reweighted adversarial framework.
Our importance weights are obtained by optimizing the KL-divergence regularized loss function.
Our proposed approach outperforms related state-of-the-art baseline methods in terms of average robust performance.
arXiv Detail & Related papers (2023-08-01T06:16:18Z) - Feature Separation and Recalibration for Adversarial Robustness [18.975320671203132]
We propose a novel, easy-to- verify approach named Feature Separation and Recalibration.
It recalibrates the malicious, non-robust activations for more robust feature maps through Separation and Recalibration.
It improves the robustness of existing adversarial training methods by up to 8.57% with small computational overhead.
arXiv Detail & Related papers (2023-03-24T07:43:57Z) - Fairness Increases Adversarial Vulnerability [50.90773979394264]
This paper shows the existence of a dichotomy between fairness and robustness, and analyzes when achieving fairness decreases the model robustness to adversarial samples.
Experiments on non-linear models and different architectures validate the theoretical findings in multiple vision domains.
The paper proposes a simple, yet effective, solution to construct models achieving good tradeoffs between fairness and robustness.
arXiv Detail & Related papers (2022-11-21T19:55:35Z) - Unifying Model Explainability and Robustness for Joint Text
Classification and Rationale Extraction [11.878012909876713]
We propose a joint classification and rationale extraction model named AT-BMC.
It includes two key mechanisms: mixed Adversarial Training (AT) is designed to use various perturbations in discrete and embedding space to improve the model's robustness, and Boundary Match Constraint (BMC) helps to locate rationales more precisely with the guidance of boundary information.
Performances on benchmark datasets demonstrate that the proposed AT-BMC outperforms baselines on both classification and rationale extraction by a large margin.
arXiv Detail & Related papers (2021-12-20T09:48:32Z) - RobustBench: a standardized adversarial robustness benchmark [84.50044645539305]
Key challenge in benchmarking robustness is that its evaluation is often error-prone leading to robustness overestimation.
We evaluate adversarial robustness with AutoAttack, an ensemble of white- and black-box attacks.
We analyze the impact of robustness on the performance on distribution shifts, calibration, out-of-distribution detection, fairness, privacy leakage, smoothness, and transferability.
arXiv Detail & Related papers (2020-10-19T17:06:18Z) - TREND: Transferability based Robust ENsemble Design [6.663641564969944]
We study the effect of network architecture, input, weight and activation quantization on transferability of adversarial samples.
We show that transferability is significantly hampered by input quantization between source and target.
We propose a new state-of-the-art ensemble attack to combat this.
arXiv Detail & Related papers (2020-08-04T13:38:14Z) - Robustness from Simple Classifiers [31.50446148110293]
We investigate the connection between robustness and simplicity.
We find that simpler classifiers, formed by reducing the number of output classes, are less susceptible to adversarial perturbations.
arXiv Detail & Related papers (2020-02-21T17:13:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.