HoneyModels: Machine Learning Honeypots
- URL: http://arxiv.org/abs/2202.10309v1
- Date: Mon, 21 Feb 2022 15:33:17 GMT
- Title: HoneyModels: Machine Learning Honeypots
- Authors: Ahmed Abdou, Ryan Sheatsley, Yohan Beugin, Tyler Shipp, Patrick
McDaniel
- Abstract summary: We study an alternate approach inspired by honeypots to detect adversaries.
Our approach yields learned models with an embedded watermark.
We show that HoneyModels can reveal 69.5% of adversaries attempting to attack a Neural Network.
- Score: 2.0499240875882
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Machine Learning is becoming a pivotal aspect of many systems today, offering
newfound performance on classification and prediction tasks, but this rapid
integration also comes with new unforeseen vulnerabilities. To harden these
systems the ever-growing field of Adversarial Machine Learning has proposed new
attack and defense mechanisms. However, a great asymmetry exists as these
defensive methods can only provide security to certain models and lack
scalability, computational efficiency, and practicality due to overly
restrictive constraints. Moreover, newly introduced attacks can easily bypass
defensive strategies by making subtle alterations. In this paper, we study an
alternate approach inspired by honeypots to detect adversaries. Our approach
yields learned models with an embedded watermark. When an adversary initiates
an interaction with our model, attacks are encouraged to add this predetermined
watermark stimulating detection of adversarial examples. We show that
HoneyModels can reveal 69.5% of adversaries attempting to attack a Neural
Network while preserving the original functionality of the model. HoneyModels
offer an alternate direction to secure Machine Learning that slightly affects
the accuracy while encouraging the creation of watermarked adversarial samples
detectable by the HoneyModel but indistinguishable from others for the
adversary.
Related papers
- Mitigating Backdoor Attacks using Activation-Guided Model Editing [8.00994004466919]
Backdoor attacks compromise the integrity and reliability of machine learning models.
We propose a novel backdoor mitigation approach via machine unlearning to counter such backdoor attacks.
arXiv Detail & Related papers (2024-07-10T13:43:47Z) - Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion [15.086451828825398]
evasion adversaries can readily exploit the shortcuts created by models memorizing watermark samples.
By learning the model to accurately recognize them, unique watermark behaviors are promoted through knowledge injection.
arXiv Detail & Related papers (2024-04-21T03:38:20Z) - Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features.
backdoor attacks subtly embed malicious behaviors within the model during training.
We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z) - Isolation and Induction: Training Robust Deep Neural Networks against
Model Stealing Attacks [51.51023951695014]
Existing model stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers.
This paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses.
In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries.
arXiv Detail & Related papers (2023-08-02T05:54:01Z) - Temporal Shuffling for Defending Deep Action Recognition Models against
Adversarial Attacks [67.58887471137436]
We develop a novel defense method using temporal shuffling of input videos against adversarial attacks for action recognition models.
To the best of our knowledge, this is the first attempt to design a defense method without additional training for 3D CNN-based video action recognition models.
arXiv Detail & Related papers (2021-12-15T06:57:01Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - "What's in the box?!": Deflecting Adversarial Attacks by Randomly
Deploying Adversarially-Disjoint Models [71.91835408379602]
adversarial examples have been long considered a real threat to machine learning models.
We propose an alternative deployment-based defense paradigm that goes beyond the traditional white-box and black-box threat models.
arXiv Detail & Related papers (2021-02-09T20:07:13Z) - Practical No-box Adversarial Attacks against DNNs [31.808770437120536]
We investigate no-box adversarial examples, where the attacker can neither access the model information or the training set nor query the model.
We propose three mechanisms for training with a very small dataset and find that prototypical reconstruction is the most effective.
Our approach significantly diminishes the average prediction accuracy of the system to only 15.40%, which is on par with the attack that transfers adversarial examples from a pre-trained Arcface model.
arXiv Detail & Related papers (2020-12-04T11:10:03Z) - A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack
and Learning [122.49765136434353]
We present an effective method, called Hamiltonian Monte Carlo with Accumulated Momentum (HMCAM), aiming to generate a sequence of adversarial examples.
We also propose a new generative method called Contrastive Adversarial Training (CAT), which approaches equilibrium distribution of adversarial examples.
Both quantitative and qualitative analysis on several natural image datasets and practical systems have confirmed the superiority of the proposed algorithm.
arXiv Detail & Related papers (2020-10-15T16:07:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.