Robust and Verifiable Information Embedding Attacks to Deep Neural
Networks via Error-Correcting Codes
- URL: http://arxiv.org/abs/2010.13751v1
- Date: Mon, 26 Oct 2020 17:42:42 GMT
- Title: Robust and Verifiable Information Embedding Attacks to Deep Neural
Networks via Error-Correcting Codes
- Authors: Jinyuan Jia, Binghui Wang, Neil Zhenqiang Gong
- Abstract summary: In the era of deep learning, a user often leverages a third-party machine learning tool to train a deep neural network (DNN) classifier.
In an information embedding attack, an attacker is the provider of a malicious third-party machine learning tool.
In this work, we aim to design information embedding attacks that are verifiable and robust against popular post-processing methods.
- Score: 81.85509264573948
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the era of deep learning, a user often leverages a third-party machine
learning tool to train a deep neural network (DNN) classifier and then deploys
the classifier as an end-user software product or a cloud service. In an
information embedding attack, an attacker is the provider of a malicious
third-party machine learning tool. The attacker embeds a message into the DNN
classifier during training and recovers the message via querying the API of the
black-box classifier after the user deploys it. Information embedding attacks
have attracted growing attention because of various applications such as
watermarking DNN classifiers and compromising user privacy. State-of-the-art
information embedding attacks have two key limitations: 1) they cannot verify
the correctness of the recovered message, and 2) they are not robust against
post-processing of the classifier.
In this work, we aim to design information embedding attacks that are
verifiable and robust against popular post-processing methods. Specifically, we
leverage Cyclic Redundancy Check to verify the correctness of the recovered
message. Moreover, to be robust against post-processing, we leverage Turbo
codes, a type of error-correcting codes, to encode the message before embedding
it to the DNN classifier. We propose to recover the message via adaptively
querying the classifier to save queries. Our adaptive recovery strategy
leverages the property of Turbo codes that supports error correcting with a
partial code. We evaluate our information embedding attacks using simulated
messages and apply them to three applications, where messages have semantic
interpretations. We consider 8 popular methods to post-process the classifier.
Our results show that our attacks can accurately and verifiably recover the
messages in all considered scenarios, while state-of-the-art attacks cannot
accurately recover the messages in many scenarios.
Related papers
- Undermining Image and Text Classification Algorithms Using Adversarial Attacks [0.0]
Our study addresses the gap by training various machine learning models and using GANs and SMOTE to generate additional data points aimed at attacking text classification models.
Our experiments reveal a significant vulnerability in classification models. Specifically, we observe a 20 % decrease in accuracy for the top-performing text classification models post-attack, along with a 30 % decrease in facial recognition accuracy.
arXiv Detail & Related papers (2024-11-03T18:44:28Z) - OrderBkd: Textual backdoor attack through repositioning [0.0]
Third-party datasets and pre-trained machine learning models pose a threat to NLP systems.
Existing backdoor attacks involve poisoning the data samples such as insertion of tokens or sentence paraphrasing.
Our main difference from the previous work is that we use the reposition of a two words in a sentence as a trigger.
arXiv Detail & Related papers (2024-02-12T14:53:37Z) - Can Sensitive Information Be Deleted From LLMs? Objectives for Defending
Against Extraction Attacks [73.53327403684676]
We propose an attack-and-defense framework for studying the task of deleting sensitive information directly from model weights.
We study direct edits to model weights because this approach should guarantee that particular deleted information is never extracted by future prompt attacks.
We show that even state-of-the-art model editing methods such as ROME struggle to truly delete factual information from models like GPT-J, as our whitebox and blackbox attacks can recover "deleted" information from an edited model 38% of the time.
arXiv Detail & Related papers (2023-09-29T17:12:43Z) - Verifying the Robustness of Automatic Credibility Assessment [79.08422736721764]
Text classification methods have been widely investigated as a way to detect content of low credibility.
In some cases insignificant changes in input text can mislead the models.
We introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - Learning to Unlearn: Instance-wise Unlearning for Pre-trained
Classifiers [71.70205894168039]
We consider instance-wise unlearning, of which the goal is to delete information on a set of instances from a pre-trained model.
We propose two methods that reduce forgetting on the remaining data: 1) utilizing adversarial examples to overcome forgetting at the representation-level and 2) leveraging weight importance metrics to pinpoint network parameters guilty of propagating unwanted information.
arXiv Detail & Related papers (2023-01-27T07:53:50Z) - PETGEN: Personalized Text Generation Attack on Deep Sequence
Embedding-based Classification Models [9.630961791758168]
Malicious users can evade deep detection models by manipulating their behavior.
Here we create a novel adversarial attack model against deep user sequence embedding-based classification models.
In the attack, the adversary generates a new post to fool the classifier.
arXiv Detail & Related papers (2021-09-14T15:48:07Z) - Backdoor Attack against Speaker Verification [86.43395230456339]
We show that it is possible to inject the hidden backdoor for infecting speaker verification models by poisoning the training data.
We also demonstrate that existing backdoor attacks cannot be directly adopted in attacking speaker verification.
arXiv Detail & Related papers (2020-10-22T11:10:08Z) - Semantic-preserving Reinforcement Learning Attack Against Graph Neural
Networks for Malware Detection [6.173795262273582]
We propose a reinforcement learning-based semantics-preserving attack against black-box GNNs for malware detection.
The proposed attack uses reinforcement learning to automatically make these "how to select" decisions.
arXiv Detail & Related papers (2020-09-11T18:30:35Z) - Anomaly Detection-Based Unknown Face Presentation Attack Detection [74.4918294453537]
Anomaly detection-based spoof attack detection is a recent development in face Presentation Attack Detection.
In this paper, we present a deep-learning solution for anomaly detection-based spoof attack detection.
The proposed approach benefits from the representation learning power of the CNNs and learns better features for fPAD task.
arXiv Detail & Related papers (2020-07-11T21:20:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.