Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer
  Learning
        - URL: http://arxiv.org/abs/2008.11089v1
- Date: Tue, 25 Aug 2020 15:04:32 GMT
- Title: Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer
  Learning
- Authors: Yinghua Zhang, Yangqiu Song, Jian Liang, Kun Bai, Qiang Yang
- Abstract summary: We show that fine-tuning effectively enhances model robustness under white-box FGSM attacks.
We also propose a black-box attack method for transfer learning models which attacks the target model with the adversarial examples produced by its source model.
To systematically measure the effect of both white-box and black-box attacks, we propose a new metric to evaluate how transferable are the adversarial examples produced by a source model to a target model.
- Score: 60.784641458579124
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Transfer learning has become a common practice for training deep learning
models with limited labeled data in a target domain. On the other hand, deep
models are vulnerable to adversarial attacks. Though transfer learning has been
widely applied, its effect on model robustness is unclear. To figure out this
problem, we conduct extensive empirical evaluations to show that fine-tuning
effectively enhances model robustness under white-box FGSM attacks. We also
propose a black-box attack method for transfer learning models which attacks
the target model with the adversarial examples produced by its source model. To
systematically measure the effect of both white-box and black-box attacks, we
propose a new metric to evaluate how transferable are the adversarial examples
produced by a source model to a target model. Empirical results show that the
adversarial examples are more transferable when fine-tuning is used than they
are when the two networks are trained independently.
 
      
        Related papers
        - Scaling Laws for Black box Adversarial Attacks [37.744814957775965]
 Adversarial examples exhibit cross-model transferability, enabling to attack black-box models.
Model ensembling is an effective strategy to improve the transferability by attacking multiple surrogate models simultaneously.
We show that scaled attacks bring better interpretability in semantics, indicating that the common features of models are captured.
 arXiv  Detail & Related papers  (2024-11-25T08:14:37Z)
- OMG-ATTACK: Self-Supervised On-Manifold Generation of Transferable
  Evasion Attacks [17.584752814352502]
 Evasion Attacks (EA) are used to test the robustness of trained neural networks by distorting input data.
We introduce a self-supervised, computationally economical method for generating adversarial examples.
Our experiments consistently demonstrate the method is effective across various models, unseen data categories, and even defended models.
 arXiv  Detail & Related papers  (2023-10-05T17:34:47Z)
- Scalable Membership Inference Attacks via Quantile Regression [35.33158339354343]
 Membership inference attacks are designed to determine, using black box access to trained models, whether a particular example was used in training or not.
We introduce a new class of attacks based on performing quantile regression on the distribution of confidence scores induced by the model under attack on points that are not used in training.
 arXiv  Detail & Related papers  (2023-07-07T16:07:00Z)
- Rethinking Model Ensemble in Transfer-based Adversarial Attacks [46.82830479910875]
 An effective strategy to improve the transferability is attacking an ensemble of models.
Previous works simply average the outputs of different models.
We propose a Common Weakness Attack (CWA) to generate more transferable adversarial examples.
 arXiv  Detail & Related papers  (2023-03-16T06:37:16Z)
- Model Inversion Attack against Transfer Learning: Inverting a Model
  without Accessing It [41.39995986856193]
 Transfer learning is an important approach that produces pre-trained teacher models.
Recent research on transfer learning has found that it is vulnerable to various attacks.
It is still not clear whether transfer learning is vulnerable to model inversion attacks.
 arXiv  Detail & Related papers  (2022-03-13T05:07:02Z)
- Learning to Learn Transferable Attack [77.67399621530052]
 Transfer adversarial attack is a non-trivial black-box adversarial attack that aims to craft adversarial perturbations on the surrogate model and then apply such perturbations to the victim model.
We propose a Learning to Learn Transferable Attack (LLTA) method, which makes the adversarial perturbations more generalized via learning from both data and model augmentation.
 Empirical results on the widely-used dataset demonstrate the effectiveness of our attack method with a 12.85% higher success rate of transfer attack compared with the state-of-the-art methods.
 arXiv  Detail & Related papers  (2021-12-10T07:24:21Z)
- Delving into Data: Effectively Substitute Training for Black-box Attack [84.85798059317963]
 We propose a novel perspective substitute training that focuses on designing the distribution of data used in the knowledge stealing process.
The combination of these two modules can further boost the consistency of the substitute model and target model, which greatly improves the effectiveness of adversarial attack.
 arXiv  Detail & Related papers  (2021-04-26T07:26:29Z)
- Learning to Attack: Towards Textual Adversarial Attacking in Real-world
  Situations [81.82518920087175]
 Adversarial attacking aims to fool deep neural networks with adversarial examples.
We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
 arXiv  Detail & Related papers  (2020-09-19T09:12:24Z)
- Adversarial Imitation Attack [63.76805962712481]
 A practical adversarial attack should require as little as possible knowledge of attacked models.
Current substitute attacks need pre-trained models to generate adversarial examples.
In this study, we propose a novel adversarial imitation attack.
 arXiv  Detail & Related papers  (2020-03-28T10:02:49Z)
- Data-Free Adversarial Perturbations for Practical Black-Box Attack [25.44755251319056]
 We present a data-free method for crafting adversarial perturbations that can fool a target model without any knowledge about the training data distribution.
Our method empirically shows that current deep learning models are still at risk even when the attackers do not have access to training data.
 arXiv  Detail & Related papers  (2020-03-03T02:22:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.