Securely Fine-tuning Pre-trained Encoders Against Adversarial Examples
- URL: http://arxiv.org/abs/2403.10801v2
- Date: Tue, 19 Mar 2024 02:45:48 GMT
- Title: Securely Fine-tuning Pre-trained Encoders Against Adversarial Examples
- Authors: Ziqi Zhou, Minghui Li, Wei Liu, Shengshan Hu, Yechao Zhang, Wei Wan, Lulu Xue, Leo Yu Zhang, Dezhong Yao, Hai Jin,
- Abstract summary: We propose a two-stage adversarial fine-tuning approach aimed at enhancing the robustness of downstream models.
Our experiments, conducted across ten self-supervised training methods and six datasets, demonstrate that Gen-AF attains high testing accuracy and robust testing accuracy against state-of-the-art DAEs.
- Score: 28.947545367473086
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the evolution of self-supervised learning, the pre-training paradigm has emerged as a predominant solution within the deep learning landscape. Model providers furnish pre-trained encoders designed to function as versatile feature extractors, enabling downstream users to harness the benefits of expansive models with minimal effort through fine-tuning. Nevertheless, recent works have exposed a vulnerability in pre-trained encoders, highlighting their susceptibility to downstream-agnostic adversarial examples (DAEs) meticulously crafted by attackers. The lingering question pertains to the feasibility of fortifying the robustness of downstream models against DAEs, particularly in scenarios where the pre-trained encoders are publicly accessible to the attackers. In this paper, we initially delve into existing defensive mechanisms against adversarial examples within the pre-training paradigm. Our findings reveal that the failure of current defenses stems from the domain shift between pre-training data and downstream tasks, as well as the sensitivity of encoder parameters. In response to these challenges, we propose Genetic Evolution-Nurtured Adversarial Fine-tuning (Gen-AF), a two-stage adversarial fine-tuning approach aimed at enhancing the robustness of downstream models. Our extensive experiments, conducted across ten self-supervised training methods and six datasets, demonstrate that Gen-AF attains high testing accuracy and robust testing accuracy against state-of-the-art DAEs.
Related papers
- Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - Robust VAEs via Generating Process of Noise Augmented Data [9.366139389037489]
This paper introduces a novel framework that enhances robustness by regularizing the latent space divergence between original and noise-augmented data.
Our empirical evaluations demonstrate that this approach, termed Robust Augmented Variational Auto-ENcoder (RAVEN), yields superior performance in resisting adversarial inputs.
arXiv Detail & Related papers (2024-07-26T09:55:34Z) - The Effectiveness of Random Forgetting for Robust Generalization [21.163070161951868]
We introduce a novel learning paradigm called "Forget to Mitigate Overfitting" (FOMO)
FOMO alternates between the forgetting phase, which randomly forgets a subset of weights, and the relearning phase, which emphasizes learning generalizable features.
Our experiments show that FOMO alleviates robust overfitting by significantly reducing the gap between the best and last robust test accuracy.
arXiv Detail & Related papers (2024-02-18T23:14:40Z) - Focus on Hiders: Exploring Hidden Threats for Enhancing Adversarial
Training [20.1991376813843]
We propose a generalized adversarial training algorithm called Hider-Focused Adversarial Training (HFAT)
HFAT combines the optimization directions of standard adversarial training and prevention hiders.
We demonstrate the effectiveness of our method based on extensive experiments.
arXiv Detail & Related papers (2023-12-12T08:41:18Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - Consistent Valid Physically-Realizable Adversarial Attack against
Crowd-flow Prediction Models [4.286570387250455]
deep learning (DL) models can effectively learn city-wide crowd-flow patterns.
DL models have been known to perform poorly on inconspicuous adversarial perturbations.
arXiv Detail & Related papers (2023-03-05T13:30:25Z) - Robust Transferable Feature Extractors: Learning to Defend Pre-Trained
Networks Against White Box Adversaries [69.53730499849023]
We show that adversarial examples can be successfully transferred to another independently trained model to induce prediction errors.
We propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE)
arXiv Detail & Related papers (2022-09-14T21:09:34Z) - Improved and Interpretable Defense to Transferred Adversarial Examples
by Jacobian Norm with Selective Input Gradient Regularization [31.516568778193157]
Adversarial training (AT) is often adopted to improve the robustness of deep neural networks (DNNs)
In this work, we propose an approach based on Jacobian norm and Selective Input Gradient Regularization (J- SIGR)
Experiments demonstrate that the proposed J- SIGR confers improved robustness against transferred adversarial attacks, and we also show that the predictions from the neural network are easy to interpret.
arXiv Detail & Related papers (2022-07-09T01:06:41Z) - Latent Boundary-guided Adversarial Training [61.43040235982727]
Adrial training is proved to be the most effective strategy that injects adversarial examples into model training.
We propose a novel adversarial training framework called LAtent bounDary-guided aDvErsarial tRaining.
arXiv Detail & Related papers (2022-06-08T07:40:55Z) - Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness.
We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z) - Adversarial Distributional Training for Robust Deep Learning [53.300984501078126]
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples.
Most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks.
In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models.
arXiv Detail & Related papers (2020-02-14T12:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.