To Transfer or Not to Transfer: Misclassification Attacks Against
Transfer Learned Text Classifiers
- URL: http://arxiv.org/abs/2001.02438v1
- Date: Wed, 8 Jan 2020 10:26:55 GMT
- Title: To Transfer or Not to Transfer: Misclassification Attacks Against
Transfer Learned Text Classifiers
- Authors: Bijeeta Pal and Shruti Tople
- Abstract summary: We present novel attack techniques that utilize unintended features learnt in the teacher (public) model to generate adversarial examples for student (downstream) models.
First, we propose a novel word-score based attack algorithm for generating adversarial examples against student models trained using context-free word-level embedding model.
Next, we present length-based and sentence-based misclassification attacks for the Fake News Detection task trained using a context-aware BERT model.
- Score: 10.762008415887195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transfer learning --- transferring learned knowledge --- has brought a
paradigm shift in the way models are trained. The lucrative benefits of
improved accuracy and reduced training time have shown promise in training
models with constrained computational resources and fewer training samples.
Specifically, publicly available text-based models such as GloVe and BERT that
are trained on large corpus of datasets have seen ubiquitous adoption in
practice. In this paper, we ask, "can transfer learning in text prediction
models be exploited to perform misclassification attacks?" As our main
contribution, we present novel attack techniques that utilize unintended
features learnt in the teacher (public) model to generate adversarial examples
for student (downstream) models. To the best of our knowledge, ours is the
first work to show that transfer learning from state-of-the-art word-based and
sentence-based teacher models increase the susceptibility of student models to
misclassification attacks. First, we propose a novel word-score based attack
algorithm for generating adversarial examples against student models trained
using context-free word-level embedding model. On binary classification tasks
trained using the GloVe teacher model, we achieve an average attack accuracy of
97% for the IMDB Movie Reviews and 80% for the Fake News Detection. For
multi-class tasks, we divide the Newsgroup dataset into 6 and 20 classes and
achieve an average attack accuracy of 75% and 41% respectively. Next, we
present length-based and sentence-based misclassification attacks for the Fake
News Detection task trained using a context-aware BERT model and achieve 78%
and 39% attack accuracy respectively. Thus, our results motivate the need for
designing training techniques that are robust to unintended feature learning,
specifically for transfer learned models.
Related papers
- Boosting Model Inversion Attacks with Adversarial Examples [26.904051413441316]
We propose a new training paradigm for a learning-based model inversion attack that can achieve higher attack accuracy in a black-box setting.
First, we regularize the training process of the attack model with an added semantic loss function.
Second, we inject adversarial examples into the training data to increase the diversity of the class-related parts.
arXiv Detail & Related papers (2023-06-24T13:40:58Z) - Revisiting Classifier: Transferring Vision-Language Models for Video
Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research.
In this study, we focus on transferring knowledge for video classification tasks.
We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z) - Revisiting the Updates of a Pre-trained Model for Few-shot Learning [11.871523410051527]
We compare the two popular updating methods, fine-tuning and linear probing.
We find that fine-tuning is better than linear probing as the number of samples increases.
arXiv Detail & Related papers (2022-05-13T08:47:06Z) - Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model.
In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z) - Adversarial Vulnerability of Active Transfer Learning [0.0]
Two widely used techniques for training supervised machine learning models on small datasets are Active Learning and Transfer Learning.
We show that the combination of these techniques is particularly susceptible to a new kind of data poisoning attack.
We show that a model trained on such a poisoned dataset has a significantly deteriorated performance, dropping from 86% to 34% test accuracy.
arXiv Detail & Related papers (2021-01-26T14:07:09Z) - Learning to Attack: Towards Textual Adversarial Attacking in Real-world
Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples.
We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z) - Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer
Learning [60.784641458579124]
We show that fine-tuning effectively enhances model robustness under white-box FGSM attacks.
We also propose a black-box attack method for transfer learning models which attacks the target model with the adversarial examples produced by its source model.
To systematically measure the effect of both white-box and black-box attacks, we propose a new metric to evaluate how transferable are the adversarial examples produced by a source model to a target model.
arXiv Detail & Related papers (2020-08-25T15:04:32Z) - Do Adversarially Robust ImageNet Models Transfer Better? [102.09335596483695]
adversarially robust models often perform better than their standard-trained counterparts when used for transfer learning.
Our results are consistent with (and in fact, add to) recent hypotheses stating that robustness leads to improved feature representations.
arXiv Detail & Related papers (2020-07-16T17:42:40Z) - Learning to Reweight with Deep Interactions [104.68509759134878]
We propose an improved data reweighting algorithm, in which the student model provides its internal states to the teacher model.
Experiments on image classification with clean/noisy labels and neural machine translation empirically demonstrate that our algorithm makes significant improvement over previous methods.
arXiv Detail & Related papers (2020-07-09T09:06:31Z) - Leveraging Siamese Networks for One-Shot Intrusion Detection Model [0.0]
Supervised Machine Learning (ML) to enhance Intrusion Detection Systems has been the subject of significant research.
retraining the models in-situ renders the network susceptible to attacks owing to the time-window required to acquire a sufficient volume of data.
Here, a complementary approach referred to as 'One-Shot Learning', whereby a limited number of examples of a new attack-class is used to identify a new attack-class.
A Siamese Network is trained to differentiate between classes based on pairs similarities, rather than features, allowing to identify new and previously unseen attacks.
arXiv Detail & Related papers (2020-06-27T11:40:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.