Related papers: Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective

Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective

URL: http://arxiv.org/abs/2401.15248v1
Date: Fri, 26 Jan 2024 23:52:20 GMT
Title: Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective
Authors: Yue Xing, Xiaofeng Lin, Qifan Song, Yi Xu, Belinda Zeng, Guang Cheng
Abstract summary: We show that feature purification plays an important role in connecting the adversarial robustness of the pre-trained model and the downstream tasks. With purified nodes, it turns out that clean training is enough to achieve adversarial robustness in downstream tasks.
Score: 30.871769067624836
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Pre-training is known to generate universal representations for downstream tasks in large-scale deep learning such as large language models. Existing literature, e.g., \cite{kim2020adversarial}, empirically observe that the downstream tasks can inherit the adversarial robustness of the pre-trained model. We provide theoretical justifications for this robustness inheritance phenomenon. Our theoretical results reveal that feature purification plays an important role in connecting the adversarial robustness of the pre-trained model and the downstream tasks in two-layer neural networks. Specifically, we show that (i) with adversarial training, each hidden node tends to pick only one (or a few) feature; (ii) without adversarial training, the hidden nodes can be vulnerable to attacks. This observation is valid for both supervised pre-training and contrastive learning. With purified nodes, it turns out that clean training is enough to achieve adversarial robustness in downstream tasks.

Related papers

Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data [38.44734564565478]
We provide a theoretical understanding of adversarial examples and adversarial training algorithms from the perspective of feature learning theory. We show that the adversarial training method can provably strengthen the robust feature learning and suppress the non-robust feature learning.
arXiv Detail & Related papers (2024-10-11T03:59:49Z)
Robust Transferable Feature Extractors: Learning to Defend Pre-Trained Networks Against White Box Adversaries [69.53730499849023]
We show that adversarial examples can be successfully transferred to another independently trained model to induce prediction errors. We propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE)
arXiv Detail & Related papers (2022-09-14T21:09:34Z)
Enhancing Adversarial Training with Feature Separability [52.39305978984573]
We introduce a new concept of adversarial training graph (ATG) with which the proposed adversarial training with feature separability (ATFS) enables to boost the intra-class feature similarity and increase inter-class feature variance. Through comprehensive experiments, we demonstrate that the proposed ATFS framework significantly improves both clean and robust performance.
arXiv Detail & Related papers (2022-05-02T04:04:23Z)
Finding Dynamics Preserving Adversarial Winning Tickets [11.05616199881368]
Pruning methods have been considered in adversarial context to reduce model capacity and improve adversarial robustness simultaneously in training. Existing adversarial pruning methods generally mimic the classical pruning methods for natural training, which follow the three-stage 'training-pruning-fine-tuning' pipelines. We show empirical evidences that AWT preserves the dynamics of adversarial training and achieve equal performance as dense adversarial training.
arXiv Detail & Related papers (2022-02-14T05:34:24Z)
How does unlabeled data improve generalization in self-training? A one-hidden-layer theoretical analysis [93.37576644429578]
This work establishes the first theoretical analysis for the known iterative self-training paradigm. We prove the benefits of unlabeled data in both training convergence and generalization ability. Experiments from shallow neural networks to deep neural networks are also provided to justify the correctness of our established theoretical insights on self-training.
arXiv Detail & Related papers (2022-01-21T02:16:52Z)
Understanding the Logit Distributions of Adversarially-Trained Deep Neural Networks [6.439477789066243]
Adversarial defenses train deep neural networks to be invariant to the input perturbations from adversarial attacks. Although adversarial training is successful at mitigating adversarial attacks, the behavioral differences between adversarially-trained (AT) models and standard models are still poorly understood. We identify three logit characteristics essential to learning adversarial robustness.
arXiv Detail & Related papers (2021-08-26T19:09:15Z)
Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness. We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z)
Feature Purification: How Adversarial Training Performs Robust Deep Learning [66.05472746340142]
We show a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network. We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly gradient descent indeed this principle.
arXiv Detail & Related papers (2020-05-20T16:56:08Z)
Towards Achieving Adversarial Robustness by Enforcing Feature Consistency Across Bit Planes [51.31334977346847]
We train networks to form coarse impressions based on the information in higher bit planes, and use the lower bit planes only to refine their prediction. We demonstrate that, by imposing consistency on the representations learned across differently quantized images, the adversarial robustness of networks improves significantly.
arXiv Detail & Related papers (2020-04-01T09:31:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.