Adaptive Domain Inference Attack with Concept Hierarchy
- URL: http://arxiv.org/abs/2312.15088v3
- Date: Mon, 27 Jan 2025 03:35:40 GMT
- Title: Adaptive Domain Inference Attack with Concept Hierarchy
- Authors: Yuechun Gu, Jiajie He, Keke Chen,
- Abstract summary: Most known model-targeted attacks assume attackers have learned the application domain or training data distribution.
Can removing the domain information from model APIs protect models from these attacks?
We show that the proposed adaptive domain inference attack (ADI) can still successfully estimate relevant subsets of training data.
- Score: 4.772368796656325
- License:
- Abstract: With increasingly deployed deep neural networks in sensitive application domains, such as healthcare and security, it's essential to understand what kind of sensitive information can be inferred from these models. Most known model-targeted attacks assume attackers have learned the application domain or training data distribution to ensure successful attacks. Can removing the domain information from model APIs protect models from these attacks? This paper studies this critical problem. Unfortunately, even with minimal knowledge, i.e., accessing the model as an unnamed function without leaking the meaning of input and output, the proposed adaptive domain inference attack (ADI) can still successfully estimate relevant subsets of training data. We show that the extracted relevant data can significantly improve, for instance, the performance of model-inversion attacks. Specifically, the ADI method utilizes a concept hierarchy extracted from a collection of available public and private datasets and a novel algorithm to adaptively tune the likelihood of leaf concepts showing up in the unseen training data. We also designed a straightforward hypothesis-testing-based attack -- LDI. The ADI attack not only extracts partial training data at the concept level but also converges fastest and requires the fewest target-model accesses among all candidate methods. Our code is available at https://anonymous.4open.science/r/KDD-362D.
Related papers
- FreqFed: A Frequency Analysis-Based Approach for Mitigating Poisoning
Attacks in Federated Learning [98.43475653490219]
Federated learning (FL) is susceptible to poisoning attacks.
FreqFed is a novel aggregation mechanism that transforms the model updates into the frequency domain.
We demonstrate that FreqFed can mitigate poisoning attacks effectively with a negligible impact on the utility of the aggregated model.
arXiv Detail & Related papers (2023-12-07T16:56:24Z) - Boosting Model Inversion Attacks with Adversarial Examples [26.904051413441316]
We propose a new training paradigm for a learning-based model inversion attack that can achieve higher attack accuracy in a black-box setting.
First, we regularize the training process of the attack model with an added semantic loss function.
Second, we inject adversarial examples into the training data to increase the diversity of the class-related parts.
arXiv Detail & Related papers (2023-06-24T13:40:58Z) - A Plot is Worth a Thousand Words: Model Information Stealing Attacks via
Scientific Plots [14.998272283348152]
It is well known that an adversary can leverage a target ML model's output to steal the model's information.
We propose a new side channel for model information stealing attacks, i.e., models' scientific plots.
arXiv Detail & Related papers (2023-02-23T12:57:34Z) - Pseudo Label-Guided Model Inversion Attack via Conditional Generative
Adversarial Network [102.21368201494909]
Model inversion (MI) attacks have raised increasing concerns about privacy.
Recent MI attacks leverage a generative adversarial network (GAN) as an image prior to narrow the search space.
We propose Pseudo Label-Guided MI (PLG-MI) attack via conditional GAN (cGAN)
arXiv Detail & Related papers (2023-02-20T07:29:34Z) - GAN-based Domain Inference Attack [3.731168012111833]
We propose a generative adversarial network (GAN) based method to explore likely or similar domains of a target model.
We find that the target model may distract the training procedure less if the domain is more similar to the target domain.
Our experiments show that the auxiliary dataset from an MDI top-ranked domain can effectively boost the result of model-inversion attacks.
arXiv Detail & Related papers (2022-12-22T15:40:53Z) - Manipulating SGD with Data Ordering Attacks [23.639512087220137]
We present a class of training-time attacks that require no changes to the underlying model dataset or architecture.
In particular, an attacker can disrupt the integrity and availability of a model by simply reordering training batches.
Attacks have a long-term impact in that they decrease model performance hundreds of epochs after the attack took place.
arXiv Detail & Related papers (2021-04-19T22:17:27Z) - Distill and Fine-tune: Effective Adaptation from a Black-box Source
Model [138.12678159620248]
Unsupervised domain adaptation (UDA) aims to transfer knowledge in previous related labeled datasets (source) to a new unlabeled dataset (target)
We propose a novel two-step adaptation framework called Distill and Fine-tune (Dis-tune)
arXiv Detail & Related papers (2021-04-04T05:29:05Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z) - Source Data-absent Unsupervised Domain Adaptation through Hypothesis
Transfer and Labeling Transfer [137.36099660616975]
Unsupervised adaptation adaptation (UDA) aims to transfer knowledge from a related but different well-labeled source domain to a new unlabeled target domain.
Most existing UDA methods require access to the source data, and thus are not applicable when the data are confidential and not shareable due to privacy concerns.
This paper aims to tackle a realistic setting with only a classification model available trained over, instead of accessing to the source data.
arXiv Detail & Related papers (2020-12-14T07:28:50Z) - Practical No-box Adversarial Attacks against DNNs [31.808770437120536]
We investigate no-box adversarial examples, where the attacker can neither access the model information or the training set nor query the model.
We propose three mechanisms for training with a very small dataset and find that prototypical reconstruction is the most effective.
Our approach significantly diminishes the average prediction accuracy of the system to only 15.40%, which is on par with the attack that transfers adversarial examples from a pre-trained Arcface model.
arXiv Detail & Related papers (2020-12-04T11:10:03Z) - Knowledge-Enriched Distributional Model Inversion Attacks [49.43828150561947]
Model inversion (MI) attacks are aimed at reconstructing training data from model parameters.
We present a novel inversion-specific GAN that can better distill knowledge useful for performing attacks on private models from public data.
Our experiments show that the combination of these techniques can significantly boost the success rate of the state-of-the-art MI attacks by 150%.
arXiv Detail & Related papers (2020-10-08T16:20:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.