Teacher Model Fingerprinting Attacks Against Transfer Learning
- URL: http://arxiv.org/abs/2106.12478v1
- Date: Wed, 23 Jun 2021 15:52:35 GMT
- Title: Teacher Model Fingerprinting Attacks Against Transfer Learning
- Authors: Yufei Chen, Chao Shen, Cong Wang, Yang Zhang
- Abstract summary: We present the first comprehensive investigation of the teacher model exposure threat in the transfer learning context.
We propose a teacher model fingerprinting attack to infer the origin of a student model it transfers from.
We show that our attack can accurately identify the model origin with few probing queries.
- Score: 23.224444604615123
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transfer learning has become a common solution to address training data
scarcity in practice. It trains a specified student model by reusing or
fine-tuning early layers of a well-trained teacher model that is usually
publicly available. However, besides utility improvement, the transferred
public knowledge also brings potential threats to model confidentiality, and
even further raises other security and privacy issues.
In this paper, we present the first comprehensive investigation of the
teacher model exposure threat in the transfer learning context, aiming to gain
a deeper insight into the tension between public knowledge and model
confidentiality. To this end, we propose a teacher model fingerprinting attack
to infer the origin of a student model, i.e., the teacher model it transfers
from. Specifically, we propose a novel optimization-based method to carefully
generate queries to probe the student model to realize our attack. Unlike
existing model reverse engineering approaches, our proposed fingerprinting
method neither relies on fine-grained model outputs, e.g., posteriors, nor
auxiliary information of the model architecture or training dataset. We
systematically evaluate the effectiveness of our proposed attack. The empirical
results demonstrate that our attack can accurately identify the model origin
with few probing queries. Moreover, we show that the proposed attack can serve
as a stepping stone to facilitating other attacks against machine learning
models, such as model stealing.
Related papers
- Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage [12.892449128678516]
Fine-tuning language models on private data for downstream applications poses significant privacy risks.
Several popular community platforms now offer convenient distribution of a large variety of pre-trained models.
We introduce a novel poisoning technique that uses model-unlearning as an attack tool.
arXiv Detail & Related papers (2024-08-30T15:35:09Z) - Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - CANIFE: Crafting Canaries for Empirical Privacy Measurement in Federated
Learning [77.27443885999404]
Federated Learning (FL) is a setting for training machine learning models in distributed environments.
We propose a novel method, CANIFE, that uses carefully crafted samples by a strong adversary to evaluate the empirical privacy of a training round.
arXiv Detail & Related papers (2022-10-06T13:30:16Z) - MOVE: Effective and Harmless Ownership Verification via Embedded
External Features [109.19238806106426]
We propose an effective and harmless model ownership verification (MOVE) to defend against different types of model stealing simultaneously.
We conduct the ownership verification by verifying whether a suspicious model contains the knowledge of defender-specified external features.
In particular, we develop our MOVE method under both white-box and black-box settings to provide comprehensive model protection.
arXiv Detail & Related papers (2022-08-04T02:22:29Z) - Learning to Learn Transferable Attack [77.67399621530052]
Transfer adversarial attack is a non-trivial black-box adversarial attack that aims to craft adversarial perturbations on the surrogate model and then apply such perturbations to the victim model.
We propose a Learning to Learn Transferable Attack (LLTA) method, which makes the adversarial perturbations more generalized via learning from both data and model augmentation.
Empirical results on the widely-used dataset demonstrate the effectiveness of our attack method with a 12.85% higher success rate of transfer attack compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-10T07:24:21Z) - Delving into Data: Effectively Substitute Training for Black-box Attack [84.85798059317963]
We propose a novel perspective substitute training that focuses on designing the distribution of data used in the knowledge stealing process.
The combination of these two modules can further boost the consistency of the substitute model and target model, which greatly improves the effectiveness of adversarial attack.
arXiv Detail & Related papers (2021-04-26T07:26:29Z) - Understanding Robustness in Teacher-Student Setting: A New Perspective [42.746182547068265]
Adrial examples are machine learning models where bounded adversarial perturbation could mislead the models to make arbitrarily incorrect predictions.
Extensive studies try to explain the existence of adversarial examples and provide ways to improve model robustness.
Our studies could shed light on the future exploration about adversarial examples, and enhancing model robustness via principled data augmentation.
arXiv Detail & Related papers (2021-02-25T20:54:24Z) - FaceLeaks: Inference Attacks against Transfer Learning Models via
Black-box Queries [2.7564955518050693]
We investigate if one can leak or infer private information without interacting with the teacher model directly.
We propose novel strategies to infer from aggregate-level information.
Our study indicates that information leakage is a real privacy threat to the transfer learning framework widely used in real-life situations.
arXiv Detail & Related papers (2020-10-27T03:02:40Z) - Learning to Attack: Towards Textual Adversarial Attacking in Real-world
Situations [81.82518920087175]
Adversarial attacking aims to fool deep neural networks with adversarial examples.
We propose a reinforcement learning based attack model, which can learn from attack history and launch attacks more efficiently.
arXiv Detail & Related papers (2020-09-19T09:12:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.