Learn from the Past: A Proxy Guided Adversarial Defense Framework with
Self Distillation Regularization
- URL: http://arxiv.org/abs/2310.12713v2
- Date: Sun, 10 Mar 2024 16:17:08 GMT
- Title: Learn from the Past: A Proxy Guided Adversarial Defense Framework with
Self Distillation Regularization
- Authors: Yaohua Liu, Jiaxin Gao, Xianghao Jiao, Zhu Liu, Xin Fan, Risheng Liu
- Abstract summary: Adversarial Training (AT) is pivotal in fortifying the robustness of deep learning models.
AT methods, relying on direct iterative updates for target model's defense, frequently encounter obstacles such as unstable training and catastrophic overfitting.
We present a general proxy guided defense framework, LAST' (bf Learn from the Pbf ast)
- Score: 53.04697800214848
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adversarial Training (AT), pivotal in fortifying the robustness of deep
learning models, is extensively adopted in practical applications. However,
prevailing AT methods, relying on direct iterative updates for target model's
defense, frequently encounter obstacles such as unstable training and
catastrophic overfitting. In this context, our work illuminates the potential
of leveraging the target model's historical states as a proxy to provide
effective initialization and defense prior, which results in a general proxy
guided defense framework, `LAST' ({\bf L}earn from the P{\bf ast}).
Specifically, LAST derives response of the proxy model as dynamically learned
fast weights, which continuously corrects the update direction of the target
model. Besides, we introduce a self-distillation regularized defense objective,
ingeniously designed to steer the proxy model's update trajectory without
resorting to external teacher models, thereby ameliorating the impact of
catastrophic overfitting on performance. Extensive experiments and ablation
studies showcase the framework's efficacy in markedly improving model
robustness (e.g., up to 9.2\% and 20.3\% enhancement in robust accuracy on
CIFAR10 and CIFAR100 datasets, respectively) and training stability. These
improvements are consistently observed across various model architectures,
larger datasets, perturbation sizes, and attack modalities, affirming LAST's
ability to consistently refine both single-step and multi-step AT strategies.
The code will be available at~\url{https://github.com/callous-youth/LAST}.
Related papers
- Robustness-Congruent Adversarial Training for Secure Machine Learning
Model Updates [13.911586916369108]
We show that misclassifications in machine-learning models can affect robustness to adversarial examples.
We propose a technique, named robustness-congruent adversarial training, to address this issue.
We show that our algorithm and, more generally, learning with non-regression constraints, provides a theoretically-grounded framework to train consistent estimators.
arXiv Detail & Related papers (2024-02-27T10:37:13Z) - Deep autoregressive density nets vs neural ensembles for model-based
offline reinforcement learning [2.9158689853305693]
We consider a model-based reinforcement learning algorithm that infers the system dynamics from the available data and performs policy optimization on imaginary model rollouts.
This approach is vulnerable to exploiting model errors which can lead to catastrophic failures on the real system.
We show that better performance can be obtained with a single well-calibrated autoregressive model on the D4RL benchmark.
arXiv Detail & Related papers (2024-02-05T10:18:15Z) - Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness [52.9493817508055]
We propose Pre-trained Model Guided Adversarial Fine-Tuning (PMG-AFT) to enhance the model's zero-shot adversarial robustness.
Our approach consistently improves clean accuracy by an average of 8.72%.
arXiv Detail & Related papers (2024-01-09T04:33:03Z) - Robust Spatiotemporal Traffic Forecasting with Reinforced Dynamic
Adversarial Training [13.998123723601651]
Machine learning-based forecasting models are commonly used in Intelligent Transportation Systems (ITS) to predict traffic patterns.
Most of the existing models are susceptible to adversarial attacks, which can lead to inaccurate predictions and negative consequences such as congestion and delays.
We propose a framework for incorporating adversarial training into traffic forecasting tasks.
arXiv Detail & Related papers (2023-06-25T04:53:29Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Alleviating Robust Overfitting of Adversarial Training With Consistency
Regularization [9.686724616328874]
Adversarial training (AT) has proven to be one of the most effective ways to defend Deep Neural Networks (DNNs) against adversarial attacks.
robustness will drop sharply at a certain stage, always exists during AT.
consistency regularization, a popular technique in semi-supervised learning, has a similar goal as AT and can be used to alleviate robust overfitting.
arXiv Detail & Related papers (2022-05-24T03:18:43Z) - DST: Dynamic Substitute Training for Data-free Black-box Attack [79.61601742693713]
We propose a novel dynamic substitute training attack method to encourage substitute model to learn better and faster from the target model.
We introduce a task-driven graph-based structure information learning constrain to improve the quality of generated training data.
arXiv Detail & Related papers (2022-04-03T02:29:11Z) - Self-Ensemble Adversarial Training for Improved Robustness [14.244311026737666]
Adversarial training is the strongest strategy against various adversarial attacks among all sorts of defense methods.
Recent works mainly focus on developing new loss functions or regularizers, attempting to find the unique optimal point in the weight space.
We devise a simple but powerful emphSelf-Ensemble Adversarial Training (SEAT) method for yielding a robust classifier by averaging weights of history models.
arXiv Detail & Related papers (2022-03-18T01:12:18Z) - Improved Adversarial Training via Learned Optimizer [101.38877975769198]
We propose a framework to improve the robustness of adversarial training models.
By co-training's parameters model's weights, the proposed framework consistently improves robustness and steps adaptively for update directions.
arXiv Detail & Related papers (2020-04-25T20:15:53Z) - Boosting Adversarial Training with Hypersphere Embedding [53.75693100495097]
Adversarial training is one of the most effective defenses against adversarial attacks for deep learning models.
In this work, we advocate incorporating the hypersphere embedding mechanism into the AT procedure.
We validate our methods under a wide range of adversarial attacks on the CIFAR-10 and ImageNet datasets.
arXiv Detail & Related papers (2020-02-20T08:42:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.