Learn from the Past: A Proxy Guided Adversarial Defense Framework with
Self Distillation Regularization
- URL: http://arxiv.org/abs/2310.12713v2
- Date: Sun, 10 Mar 2024 16:17:08 GMT
- Title: Learn from the Past: A Proxy Guided Adversarial Defense Framework with
Self Distillation Regularization
- Authors: Yaohua Liu, Jiaxin Gao, Xianghao Jiao, Zhu Liu, Xin Fan, Risheng Liu
- Abstract summary: Adversarial Training (AT) is pivotal in fortifying the robustness of deep learning models.
AT methods, relying on direct iterative updates for target model's defense, frequently encounter obstacles such as unstable training and catastrophic overfitting.
We present a general proxy guided defense framework, LAST' (bf Learn from the Pbf ast)
- Score: 53.04697800214848
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adversarial Training (AT), pivotal in fortifying the robustness of deep
learning models, is extensively adopted in practical applications. However,
prevailing AT methods, relying on direct iterative updates for target model's
defense, frequently encounter obstacles such as unstable training and
catastrophic overfitting. In this context, our work illuminates the potential
of leveraging the target model's historical states as a proxy to provide
effective initialization and defense prior, which results in a general proxy
guided defense framework, `LAST' ({\bf L}earn from the P{\bf ast}).
Specifically, LAST derives response of the proxy model as dynamically learned
fast weights, which continuously corrects the update direction of the target
model. Besides, we introduce a self-distillation regularized defense objective,
ingeniously designed to steer the proxy model's update trajectory without
resorting to external teacher models, thereby ameliorating the impact of
catastrophic overfitting on performance. Extensive experiments and ablation
studies showcase the framework's efficacy in markedly improving model
robustness (e.g., up to 9.2\% and 20.3\% enhancement in robust accuracy on
CIFAR10 and CIFAR100 datasets, respectively) and training stability. These
improvements are consistently observed across various model architectures,
larger datasets, perturbation sizes, and attack modalities, affirming LAST's
ability to consistently refine both single-step and multi-step AT strategies.
The code will be available at~\url{https://github.com/callous-youth/LAST}.
Related papers
- A Hybrid Defense Strategy for Boosting Adversarial Robustness in Vision-Language Models [9.304845676825584]
We propose a novel adversarial training framework that integrates multiple attack strategies and advanced machine learning techniques.
Experiments conducted on real-world datasets, including CIFAR-10 and CIFAR-100, demonstrate that the proposed method significantly enhances model robustness.
arXiv Detail & Related papers (2024-10-18T23:47:46Z) - Dynamic Label Adversarial Training for Deep Learning Robustness Against Adversarial Attacks [11.389689242531327]
Adversarial training is one of the most effective methods for enhancing model robustness.
Previous approaches primarily use static ground truth for adversarial training, but this often causes robust overfitting.
We propose a dynamic label adversarial training (DYNAT) algorithm that enables the target model to gain robustness from the guide model's decisions.
arXiv Detail & Related papers (2024-08-23T14:25:12Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - Robustness-Congruent Adversarial Training for Secure Machine Learning
Model Updates [13.911586916369108]
We show that misclassifications in machine-learning models can affect robustness to adversarial examples.
We propose a technique, named robustness-congruent adversarial training, to address this issue.
We show that our algorithm and, more generally, learning with non-regression constraints, provides a theoretically-grounded framework to train consistent estimators.
arXiv Detail & Related papers (2024-02-27T10:37:13Z) - Deep autoregressive density nets vs neural ensembles for model-based
offline reinforcement learning [2.9158689853305693]
We consider a model-based reinforcement learning algorithm that infers the system dynamics from the available data and performs policy optimization on imaginary model rollouts.
This approach is vulnerable to exploiting model errors which can lead to catastrophic failures on the real system.
We show that better performance can be obtained with a single well-calibrated autoregressive model on the D4RL benchmark.
arXiv Detail & Related papers (2024-02-05T10:18:15Z) - Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness [52.9493817508055]
We propose Pre-trained Model Guided Adversarial Fine-Tuning (PMG-AFT) to enhance the model's zero-shot adversarial robustness.
Our approach consistently improves clean accuracy by an average of 8.72%.
arXiv Detail & Related papers (2024-01-09T04:33:03Z) - Robust Spatiotemporal Traffic Forecasting with Reinforced Dynamic
Adversarial Training [13.998123723601651]
Machine learning-based forecasting models are commonly used in Intelligent Transportation Systems (ITS) to predict traffic patterns.
Most of the existing models are susceptible to adversarial attacks, which can lead to inaccurate predictions and negative consequences such as congestion and delays.
We propose a framework for incorporating adversarial training into traffic forecasting tasks.
arXiv Detail & Related papers (2023-06-25T04:53:29Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Improved Adversarial Training via Learned Optimizer [101.38877975769198]
We propose a framework to improve the robustness of adversarial training models.
By co-training's parameters model's weights, the proposed framework consistently improves robustness and steps adaptively for update directions.
arXiv Detail & Related papers (2020-04-25T20:15:53Z) - Boosting Adversarial Training with Hypersphere Embedding [53.75693100495097]
Adversarial training is one of the most effective defenses against adversarial attacks for deep learning models.
In this work, we advocate incorporating the hypersphere embedding mechanism into the AT procedure.
We validate our methods under a wide range of adversarial attacks on the CIFAR-10 and ImageNet datasets.
arXiv Detail & Related papers (2020-02-20T08:42:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.