Related papers: HoneyModels: Machine Learning Honeypots

HoneyModels: Machine Learning Honeypots

URL: http://arxiv.org/abs/2202.10309v1
Date: Mon, 21 Feb 2022 15:33:17 GMT
Title: HoneyModels: Machine Learning Honeypots
Authors: Ahmed Abdou, Ryan Sheatsley, Yohan Beugin, Tyler Shipp, Patrick McDaniel
Abstract summary: We study an alternate approach inspired by honeypots to detect adversaries. Our approach yields learned models with an embedded watermark. We show that HoneyModels can reveal 69.5% of adversaries attempting to attack a Neural Network.
Score: 2.0499240875882
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Machine Learning is becoming a pivotal aspect of many systems today, offering newfound performance on classification and prediction tasks, but this rapid integration also comes with new unforeseen vulnerabilities. To harden these systems the ever-growing field of Adversarial Machine Learning has proposed new attack and defense mechanisms. However, a great asymmetry exists as these defensive methods can only provide security to certain models and lack scalability, computational efficiency, and practicality due to overly restrictive constraints. Moreover, newly introduced attacks can easily bypass defensive strategies by making subtle alterations. In this paper, we study an alternate approach inspired by honeypots to detect adversaries. Our approach yields learned models with an embedded watermark. When an adversary initiates an interaction with our model, attacks are encouraged to add this predetermined watermark stimulating detection of adversarial examples. We show that HoneyModels can reveal 69.5% of adversaries attempting to attack a Neural Network while preserving the original functionality of the model. HoneyModels offer an alternate direction to secure Machine Learning that slightly affects the accuracy while encouraging the creation of watermarked adversarial samples detectable by the HoneyModel but indistinguishable from others for the adversary.

Related papers

Mitigating Backdoor Attacks using Activation-Guided Model Editing [8.00994004466919]
Backdoor attacks compromise the integrity and reliability of machine learning models. We propose a novel backdoor mitigation approach via machine unlearning to counter such backdoor attacks.
arXiv Detail & Related papers (2024-07-10T13:43:47Z)
Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion [15.086451828825398]
evasion adversaries can readily exploit the shortcuts created by models memorizing watermark samples. By learning the model to accurately recognize them, unique watermark behaviors are promoted through knowledge injection.
arXiv Detail & Related papers (2024-04-21T03:38:20Z)
Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features. backdoor attacks subtly embed malicious behaviors within the model during training. We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z)
Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks [51.51023951695014]
Existing model stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers. This paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses. In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries.
arXiv Detail & Related papers (2023-08-02T05:54:01Z)
Temporal Shuffling for Defending Deep Action Recognition Models against Adversarial Attacks [67.58887471137436]
We develop a novel defense method using temporal shuffling of input videos against adversarial attacks for action recognition models. To the best of our knowledge, this is the first attempt to design a defense method without additional training for 3D CNN-based video action recognition models.
arXiv Detail & Related papers (2021-12-15T06:57:01Z)
Towards A Conceptually Simple Defensive Approach for Few-shot classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks. We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering. Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z)
"What's in the box?!": Deflecting Adversarial Attacks by Randomly Deploying Adversarially-Disjoint Models [71.91835408379602]
adversarial examples have been long considered a real threat to machine learning models. We propose an alternative deployment-based defense paradigm that goes beyond the traditional white-box and black-box threat models.
arXiv Detail & Related papers (2021-02-09T20:07:13Z)
Practical No-box Adversarial Attacks against DNNs [31.808770437120536]
We investigate no-box adversarial examples, where the attacker can neither access the model information or the training set nor query the model. We propose three mechanisms for training with a very small dataset and find that prototypical reconstruction is the most effective. Our approach significantly diminishes the average prediction accuracy of the system to only 15.40%, which is on par with the attack that transfers adversarial examples from a pre-trained Arcface model.
arXiv Detail & Related papers (2020-12-04T11:10:03Z)
A Hamiltonian Monte Carlo Method for Probabilistic Adversarial Attack and Learning [122.49765136434353]
We present an effective method, called Hamiltonian Monte Carlo with Accumulated Momentum (HMCAM), aiming to generate a sequence of adversarial examples. We also propose a new generative method called Contrastive Adversarial Training (CAT), which approaches equilibrium distribution of adversarial examples. Both quantitative and qualitative analysis on several natural image datasets and practical systems have confirmed the superiority of the proposed algorithm.
arXiv Detail & Related papers (2020-10-15T16:07:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.