Related papers: Securely Fine-tuning Pre-trained Encoders Against Adversarial Examples

Securely Fine-tuning Pre-trained Encoders Against Adversarial Examples

URL: http://arxiv.org/abs/2403.10801v2
Date: Tue, 19 Mar 2024 02:45:48 GMT
Title: Securely Fine-tuning Pre-trained Encoders Against Adversarial Examples
Authors: Ziqi Zhou, Minghui Li, Wei Liu, Shengshan Hu, Yechao Zhang, Wei Wan, Lulu Xue, Leo Yu Zhang, Dezhong Yao, Hai Jin,
Abstract summary: We propose a two-stage adversarial fine-tuning approach aimed at enhancing the robustness of downstream models. Our experiments, conducted across ten self-supervised training methods and six datasets, demonstrate that Gen-AF attains high testing accuracy and robust testing accuracy against state-of-the-art DAEs.
Score: 28.947545367473086
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the evolution of self-supervised learning, the pre-training paradigm has emerged as a predominant solution within the deep learning landscape. Model providers furnish pre-trained encoders designed to function as versatile feature extractors, enabling downstream users to harness the benefits of expansive models with minimal effort through fine-tuning. Nevertheless, recent works have exposed a vulnerability in pre-trained encoders, highlighting their susceptibility to downstream-agnostic adversarial examples (DAEs) meticulously crafted by attackers. The lingering question pertains to the feasibility of fortifying the robustness of downstream models against DAEs, particularly in scenarios where the pre-trained encoders are publicly accessible to the attackers. In this paper, we initially delve into existing defensive mechanisms against adversarial examples within the pre-training paradigm. Our findings reveal that the failure of current defenses stems from the domain shift between pre-training data and downstream tasks, as well as the sensitivity of encoder parameters. In response to these challenges, we propose Genetic Evolution-Nurtured Adversarial Fine-tuning (Gen-AF), a two-stage adversarial fine-tuning approach aimed at enhancing the robustness of downstream models. Our extensive experiments, conducted across ten self-supervised training methods and six datasets, demonstrate that Gen-AF attains high testing accuracy and robust testing accuracy against state-of-the-art DAEs.

Related papers

Zero-Sacrifice Persistent-Robustness Adversarial Defense for Pre-Trained Encoders [5.00483763729881]
We introduce Zero-Sacrifice Persistent-Robustness Adversarial Defense (ZePAD)<n>ZePAD is inspired by the inherent sensitivity of neural networks to data characteristics.<n>It uses two adversarially fine-tuned encoders to strengthen adversarial resistance.
arXiv Detail & Related papers (2026-02-10T07:41:34Z)
Unleashing the Power of Pre-trained Encoders for Universal Adversarial Attack Detection [21.03032944637112]
Adrial attacks pose a critical security threat to real-world AI systems. This paper proposes a lightweight adversarial detection framework based on the large-scale pre-trained vision-language model CLIP.
arXiv Detail & Related papers (2025-04-01T05:21:45Z)
Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks. We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z)
Robust VAEs via Generating Process of Noise Augmented Data [9.366139389037489]
This paper introduces a novel framework that enhances robustness by regularizing the latent space divergence between original and noise-augmented data. Our empirical evaluations demonstrate that this approach, termed Robust Augmented Variational Auto-ENcoder (RAVEN), yields superior performance in resisting adversarial inputs.
arXiv Detail & Related papers (2024-07-26T09:55:34Z)
The Effectiveness of Random Forgetting for Robust Generalization [21.163070161951868]
We introduce a novel learning paradigm called "Forget to Mitigate Overfitting" (FOMO) FOMO alternates between the forgetting phase, which randomly forgets a subset of weights, and the relearning phase, which emphasizes learning generalizable features. Our experiments show that FOMO alleviates robust overfitting by significantly reducing the gap between the best and last robust test accuracy.
arXiv Detail & Related papers (2024-02-18T23:14:40Z)
Focus on Hiders: Exploring Hidden Threats for Enhancing Adversarial Training [20.1991376813843]
We propose a generalized adversarial training algorithm called Hider-Focused Adversarial Training (HFAT) HFAT combines the optimization directions of standard adversarial training and prevention hiders. We demonstrate the effectiveness of our method based on extensive experiments.
arXiv Detail & Related papers (2023-12-12T08:41:18Z)
Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition. We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training. We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z)
Consistent Valid Physically-Realizable Adversarial Attack against Crowd-flow Prediction Models [4.286570387250455]
deep learning (DL) models can effectively learn city-wide crowd-flow patterns. DL models have been known to perform poorly on inconspicuous adversarial perturbations.
arXiv Detail & Related papers (2023-03-05T13:30:25Z)
Robust Transferable Feature Extractors: Learning to Defend Pre-Trained Networks Against White Box Adversaries [69.53730499849023]
We show that adversarial examples can be successfully transferred to another independently trained model to induce prediction errors. We propose a deep learning-based pre-processing mechanism, which we refer to as a robust transferable feature extractor (RTFE)
arXiv Detail & Related papers (2022-09-14T21:09:34Z)
Improved and Interpretable Defense to Transferred Adversarial Examples by Jacobian Norm with Selective Input Gradient Regularization [31.516568778193157]
Adversarial training (AT) is often adopted to improve the robustness of deep neural networks (DNNs) In this work, we propose an approach based on Jacobian norm and Selective Input Gradient Regularization (J- SIGR) Experiments demonstrate that the proposed J- SIGR confers improved robustness against transferred adversarial attacks, and we also show that the predictions from the neural network are easy to interpret.
arXiv Detail & Related papers (2022-07-09T01:06:41Z)
Latent Boundary-guided Adversarial Training [61.43040235982727]
Adrial training is proved to be the most effective strategy that injects adversarial examples into model training. We propose a novel adversarial training framework called LAtent bounDary-guided aDvErsarial tRaining.
arXiv Detail & Related papers (2022-06-08T07:40:55Z)
Robust Pre-Training by Adversarial Contrastive Learning [120.33706897927391]
Recent work has shown that, when integrated with adversarial training, self-supervised pre-training can lead to state-of-the-art robustness. We improve robustness-aware self-supervised pre-training by learning representations consistent under both data augmentations and adversarial perturbations.
arXiv Detail & Related papers (2020-10-26T04:44:43Z)
Adversarial Distributional Training for Robust Deep Learning [53.300984501078126]
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples. Most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks. In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models.
arXiv Detail & Related papers (2020-02-14T12:36:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.