Related papers: Robust and Verifiable Information Embedding Attacks to Deep Neural Networks via Error-Correcting Codes

Robust and Verifiable Information Embedding Attacks to Deep Neural Networks via Error-Correcting Codes

URL: http://arxiv.org/abs/2010.13751v1
Date: Mon, 26 Oct 2020 17:42:42 GMT
Title: Robust and Verifiable Information Embedding Attacks to Deep Neural Networks via Error-Correcting Codes
Authors: Jinyuan Jia, Binghui Wang, Neil Zhenqiang Gong
Abstract summary: In the era of deep learning, a user often leverages a third-party machine learning tool to train a deep neural network (DNN) classifier. In an information embedding attack, an attacker is the provider of a malicious third-party machine learning tool. In this work, we aim to design information embedding attacks that are verifiable and robust against popular post-processing methods.
Score: 81.85509264573948
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the era of deep learning, a user often leverages a third-party machine learning tool to train a deep neural network (DNN) classifier and then deploys the classifier as an end-user software product or a cloud service. In an information embedding attack, an attacker is the provider of a malicious third-party machine learning tool. The attacker embeds a message into the DNN classifier during training and recovers the message via querying the API of the black-box classifier after the user deploys it. Information embedding attacks have attracted growing attention because of various applications such as watermarking DNN classifiers and compromising user privacy. State-of-the-art information embedding attacks have two key limitations: 1) they cannot verify the correctness of the recovered message, and 2) they are not robust against post-processing of the classifier. In this work, we aim to design information embedding attacks that are verifiable and robust against popular post-processing methods. Specifically, we leverage Cyclic Redundancy Check to verify the correctness of the recovered message. Moreover, to be robust against post-processing, we leverage Turbo codes, a type of error-correcting codes, to encode the message before embedding it to the DNN classifier. We propose to recover the message via adaptively querying the classifier to save queries. Our adaptive recovery strategy leverages the property of Turbo codes that supports error correcting with a partial code. We evaluate our information embedding attacks using simulated messages and apply them to three applications, where messages have semantic interpretations. We consider 8 popular methods to post-process the classifier. Our results show that our attacks can accurately and verifiably recover the messages in all considered scenarios, while state-of-the-art attacks cannot accurately recover the messages in many scenarios.

Related papers

Debate-Driven Multi-Agent LLMs for Phishing Email Detection [0.0]
We propose a multi-agent large language model (LLM) prompting technique that simulates deceptive debates among agents to detect phishing emails. Our approach uses two LLM agents to present arguments for or against the classification task, with a judge agent adjudicating the final verdict. Results show that the debate structure itself is sufficient to yield accurate decisions without extra prompting strategies.
arXiv Detail & Related papers (2025-03-27T23:18:14Z)
An investigation into the performances of the Current state-of-the-art Naive Bayes, Non-Bayesian and Deep Learning Based Classifier for Phishing Detection: A Survey [0.9567504785687562]
Phishing is one of the most effective ways in which cybercriminals get sensitive details from potential victims. In this research, we did a comprehensive review of current state-of-the-art machine learning and deep learning phishing detection techniques.
arXiv Detail & Related papers (2024-11-24T05:20:09Z)
OrderBkd: Textual backdoor attack through repositioning [0.0]
Third-party datasets and pre-trained machine learning models pose a threat to NLP systems. Existing backdoor attacks involve poisoning the data samples such as insertion of tokens or sentence paraphrasing. Our main difference from the previous work is that we use the reposition of a two words in a sentence as a trigger.
arXiv Detail & Related papers (2024-02-12T14:53:37Z)
Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks [73.53327403684676]
We propose an attack-and-defense framework for studying the task of deleting sensitive information directly from model weights. We study direct edits to model weights because this approach should guarantee that particular deleted information is never extracted by future prompt attacks. We show that even state-of-the-art model editing methods such as ROME struggle to truly delete factual information from models like GPT-J, as our whitebox and blackbox attacks can recover "deleted" information from an edited model 38% of the time.
arXiv Detail & Related papers (2023-09-29T17:12:43Z)
Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models. We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks. Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z)
Learning to Unlearn: Instance-wise Unlearning for Pre-trained Classifiers [71.70205894168039]
We consider instance-wise unlearning, of which the goal is to delete information on a set of instances from a pre-trained model. We propose two methods that reduce forgetting on the remaining data: 1) utilizing adversarial examples to overcome forgetting at the representation-level and 2) leveraging weight importance metrics to pinpoint network parameters guilty of propagating unwanted information.
arXiv Detail & Related papers (2023-01-27T07:53:50Z)
PETGEN: Personalized Text Generation Attack on Deep Sequence Embedding-based Classification Models [9.630961791758168]
Malicious users can evade deep detection models by manipulating their behavior. Here we create a novel adversarial attack model against deep user sequence embedding-based classification models. In the attack, the adversary generates a new post to fool the classifier.
arXiv Detail & Related papers (2021-09-14T15:48:07Z)
Backdoor Attack against Speaker Verification [86.43395230456339]
We show that it is possible to inject the hidden backdoor for infecting speaker verification models by poisoning the training data. We also demonstrate that existing backdoor attacks cannot be directly adopted in attacking speaker verification.
arXiv Detail & Related papers (2020-10-22T11:10:08Z)
Semantic-preserving Reinforcement Learning Attack Against Graph Neural Networks for Malware Detection [6.173795262273582]
We propose a reinforcement learning-based semantics-preserving attack against black-box GNNs for malware detection. The proposed attack uses reinforcement learning to automatically make these "how to select" decisions.
arXiv Detail & Related papers (2020-09-11T18:30:35Z)
Anomaly Detection-Based Unknown Face Presentation Attack Detection [74.4918294453537]
Anomaly detection-based spoof attack detection is a recent development in face Presentation Attack Detection. In this paper, we present a deep-learning solution for anomaly detection-based spoof attack detection. The proposed approach benefits from the representation learning power of the CNNs and learns better features for fPAD task.
arXiv Detail & Related papers (2020-07-11T21:20:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.