Beyond Model Extraction: Imitation Attack for Black-Box NLP APIs
- URL: http://arxiv.org/abs/2108.13873v1
- Date: Sun, 29 Aug 2021 10:52:04 GMT
- Title: Beyond Model Extraction: Imitation Attack for Black-Box NLP APIs
- Authors: Qiongkai Xu, Xuanli He, Lingjuan Lyu, Lizhen Qu, Gholamreza Haffari
- Abstract summary: We show that attackers could potentially surpass victims via unsupervised domain adaptation and multi-victim ensemble.
In this work, we take the first step of showing that attackers could potentially surpass victims via unsupervised domain adaptation and multi-victim ensemble.
- Score: 36.258615610948524
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Machine-learning-as-a-service (MLaaS) has attracted millions of users to
their outperforming sophisticated models. Although published as black-box APIs,
the valuable models behind these services are still vulnerable to imitation
attacks. Recently, a series of works have demonstrated that attackers manage to
steal or extract the victim models. Nonetheless, none of the previous stolen
models can outperform the original black-box APIs. In this work, we take the
first step of showing that attackers could potentially surpass victims via
unsupervised domain adaptation and multi-victim ensemble. Extensive experiments
on benchmark datasets and real-world APIs validate that the imitators can
succeed in outperforming the original black-box models. We consider this as a
milestone in the research of imitation attack, especially on NLP APIs, as the
superior performance could influence the defense or even publishing strategy of
API providers.
Related papers
- DREAM: Domain-agnostic Reverse Engineering Attributes of Black-box Model [50.94236887900527]
We present a new problem of black-box reverse engineering, without requiring the availability of the target model's training dataset.
We learn a domain-agnostic meta-model to infer the attributes of the target black-box model with unknown training data.
arXiv Detail & Related papers (2024-12-08T07:37:05Z) - Can't Hide Behind the API: Stealing Black-Box Commercial Embedding Models [47.13042922690422]
Companies such as OpenAI and Cohere have developed competing embedding models accessed through APIs that require users to pay for usage.
We present, to our knowledge, the first effort to "steal" these models for retrieval by training local models on text-embedding pairs obtained from the commercial APIs.
arXiv Detail & Related papers (2024-06-13T17:40:56Z) - Stealing Part of a Production Language Model [99.33245067682984]
We introduce the first model-stealing attack that extracts precise, nontrivial information from production language models.
For under $20 USD, our attack extracts the entire projection matrix of OpenAI's Ada and Babbage language models.
arXiv Detail & Related papers (2024-03-11T11:46:12Z) - Beyond Labeling Oracles: What does it mean to steal ML models? [52.63413852460003]
Model extraction attacks are designed to steal trained models with only query access.
We investigate factors influencing the success of model extraction attacks.
Our findings urge the community to redefine the adversarial goals of ME attacks.
arXiv Detail & Related papers (2023-10-03T11:10:21Z) - DREAM: Domain-free Reverse Engineering Attributes of Black-box Model [51.37041886352823]
We propose a new problem of Domain-agnostic Reverse Engineering the Attributes of a black-box target model.
We learn a domain-agnostic model to infer the attributes of a target black-box model with unknown training data.
arXiv Detail & Related papers (2023-07-20T16:25:58Z) - Reinforcement Learning-Based Black-Box Model Inversion Attacks [23.30144908939099]
Model inversion attacks reconstruct private data used to train a machine learning model.
White-box model inversion attacks leveraging Generative Adversarial Networks (GANs) to distill knowledge from public datasets have been receiving great attention.
We propose a reinforcement learning-based black-box model inversion attack.
arXiv Detail & Related papers (2023-04-10T14:41:16Z) - Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence [34.35162562625252]
Black-box adversarial attacks have demonstrated strong potential to compromise machine learning models.
We study a new paradigm of black-box attacks with provable guarantees.
This new black-box attack unveils significant vulnerabilities of machine learning models.
arXiv Detail & Related papers (2023-04-10T01:12:09Z) - Ensemble-based Blackbox Attacks on Dense Prediction [16.267479602370543]
We show that a carefully designed ensemble can create effective attacks for a number of victim models.
In particular, we show that normalization of the weights for individual models plays a critical role in the success of the attacks.
Our proposed method can also generate a single perturbation that can fool multiple blackbox detection and segmentation models simultaneously.
arXiv Detail & Related papers (2023-03-25T00:08:03Z) - Distributed Black-box Attack: Do Not Overestimate Black-box Attacks [4.764637544913963]
Black-box adversarial attacks can fool image classification models without access to model structure and weights.
Recent studies have reported attack success rates of over 95% with fewer than 1,000 queries.
Our research indicates that black-box attacks are not as effective against cloud APIs as proposed in research papers.
arXiv Detail & Related papers (2022-10-28T19:14:03Z) - Model Extraction and Adversarial Transferability, Your BERT is
Vulnerable! [11.425692676973332]
We show how an adversary can steal a BERT-based API service on multiple benchmark datasets with limited prior knowledge and queries.
We also show that the extracted model can lead to highly transferable adversarial attacks against the victim model.
Our studies indicate that the potential vulnerabilities of BERT-based API services still hold, even when there is an architectural mismatch between the victim model and the attack model.
arXiv Detail & Related papers (2021-03-18T04:23:21Z) - Improving Query Efficiency of Black-box Adversarial Attack [75.71530208862319]
We propose a Neural Process based black-box adversarial attack (NP-Attack)
NP-Attack could greatly decrease the query counts under the black-box setting.
arXiv Detail & Related papers (2020-09-24T06:22:56Z) - Imitation Attacks and Defenses for Black-box Machine Translation Systems [86.92681013449682]
Black-box machine translation (MT) systems have high commercial value and errors can be costly.
We show that MT systems can be stolen by querying them with monolingual sentences and training models to imitate their outputs.
We propose a defense that modifies translation outputs in order to misdirect the optimization of imitation models.
arXiv Detail & Related papers (2020-04-30T17:56:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.