Label-Only Model Inversion Attacks via Knowledge Transfer
- URL: http://arxiv.org/abs/2310.19342v1
- Date: Mon, 30 Oct 2023 08:32:12 GMT
- Title: Label-Only Model Inversion Attacks via Knowledge Transfer
- Authors: Ngoc-Bao Nguyen, Keshigeyan Chandrasegaran, Milad Abdollahzadeh,
Ngai-Man Cheung
- Abstract summary: In a model inversion (MI) attack, an adversary abuses access to a machine learning (ML) model to infer and reconstruct private data.
We propose LOKT, a novel approach for label-only MI attacks.
Our method significantly outperforms existing SOTA Label-only MI attack by more than 15% across all MI benchmarks.
- Score: 35.42380723970432
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In a model inversion (MI) attack, an adversary abuses access to a machine
learning (ML) model to infer and reconstruct private training data. Remarkable
progress has been made in the white-box and black-box setups, where the
adversary has access to the complete model or the model's soft output
respectively. However, there is very limited study in the most challenging but
practically important setup: Label-only MI attacks, where the adversary only
has access to the model's predicted label (hard label) without confidence
scores nor any other model information.
In this work, we propose LOKT, a novel approach for label-only MI attacks.
Our idea is based on transfer of knowledge from the opaque target model to
surrogate models. Subsequently, using these surrogate models, our approach can
harness advanced white-box attacks. We propose knowledge transfer based on
generative modelling, and introduce a new model, Target model-assisted ACGAN
(T-ACGAN), for effective knowledge transfer. Our method casts the challenging
label-only MI into the more tractable white-box setup. We provide analysis to
support that surrogate models based on our approach serve as effective proxies
for the target model for MI. Our experiments show that our method significantly
outperforms existing SOTA Label-only MI attack by more than 15% across all MI
benchmarks. Furthermore, our method compares favorably in terms of query
budget. Our study highlights rising privacy threats for ML models even when
minimal information (i.e., hard labels) is exposed. Our study highlights rising
privacy threats for ML models even when minimal information (i.e., hard labels)
is exposed. Our code, demo, models and reconstructed data are available at our
project page: https://ngoc-nguyen-0.github.io/lokt/
Related papers
- Model Inversion Robustness: Can Transfer Learning Help? [27.883074562565877]
Model Inversion (MI) attacks aim to reconstruct private training data by abusing access to machine learning models.
We propose Transfer Learning-based Defense against Model Inversion (TL-DMI) to render MI-robust models.
Our method achieves state-of-the-art (SOTA) MI robustness without bells and whistles.
arXiv Detail & Related papers (2024-05-09T07:24:28Z) - Data-Free Hard-Label Robustness Stealing Attack [67.41281050467889]
We introduce a novel Data-Free Hard-Label Robustness Stealing (DFHL-RS) attack in this paper.
It enables the stealing of both model accuracy and robustness by simply querying hard labels of the target model.
Our method achieves a clean accuracy of 77.86% and a robust accuracy of 39.51% against AutoAttack.
arXiv Detail & Related papers (2023-12-10T16:14:02Z) - Army of Thieves: Enhancing Black-Box Model Extraction via Ensemble based
sample selection [10.513955887214497]
In Model Stealing Attacks (MSA), a machine learning model is queried repeatedly to build a labelled dataset.
In this work, we explore the usage of an ensemble of deep learning models as our thief model.
We achieve a 21% higher adversarial sample transferability than previous work for models trained on the CIFAR-10 dataset.
arXiv Detail & Related papers (2023-11-08T10:31:29Z) - Beyond Labeling Oracles: What does it mean to steal ML models? [52.63413852460003]
Model extraction attacks are designed to steal trained models with only query access.
We investigate factors influencing the success of model extraction attacks.
Our findings urge the community to redefine the adversarial goals of ME attacks.
arXiv Detail & Related papers (2023-10-03T11:10:21Z) - Unstoppable Attack: Label-Only Model Inversion via Conditional Diffusion
Model [14.834360664780709]
Model attacks (MIAs) aim to recover private data from inaccessible training sets of deep learning models.
This paper develops a novel MIA method, leveraging a conditional diffusion model (CDM) to recover representative samples under the target label.
Experimental results show that this method can generate similar and accurate samples to the target label, outperforming generators of previous approaches.
arXiv Detail & Related papers (2023-07-17T12:14:24Z) - MOVE: Effective and Harmless Ownership Verification via Embedded
External Features [109.19238806106426]
We propose an effective and harmless model ownership verification (MOVE) to defend against different types of model stealing simultaneously.
We conduct the ownership verification by verifying whether a suspicious model contains the knowledge of defender-specified external features.
In particular, we develop our MOVE method under both white-box and black-box settings to provide comprehensive model protection.
arXiv Detail & Related papers (2022-08-04T02:22:29Z) - Label-Only Model Inversion Attacks via Boundary Repulsion [12.374249336222906]
We introduce an algorithm to invert private training data using only the target model's predicted labels.
Using the example of face recognition, we show that the images reconstructed by BREP-MI successfully reproduce the semantics of the private training data.
arXiv Detail & Related papers (2022-03-03T18:57:57Z) - Knowledge-Enriched Distributional Model Inversion Attacks [49.43828150561947]
Model inversion (MI) attacks are aimed at reconstructing training data from model parameters.
We present a novel inversion-specific GAN that can better distill knowledge useful for performing attacks on private models from public data.
Our experiments show that the combination of these techniques can significantly boost the success rate of the state-of-the-art MI attacks by 150%.
arXiv Detail & Related papers (2020-10-08T16:20:48Z) - How Does Data Augmentation Affect Privacy in Machine Learning? [94.52721115660626]
We propose new MI attacks to utilize the information of augmented data.
We establish the optimal membership inference when the model is trained with augmented data.
arXiv Detail & Related papers (2020-07-21T02:21:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.