Unsupervised Adversarial Detection without Extra Model: Training Loss
Should Change
- URL: http://arxiv.org/abs/2308.03243v1
- Date: Mon, 7 Aug 2023 01:41:21 GMT
- Title: Unsupervised Adversarial Detection without Extra Model: Training Loss
Should Change
- Authors: Chien Cheng Chyou, Hung-Ting Su, Winston H. Hsu
- Abstract summary: Traditional approaches to adversarial training and supervised detection rely on prior knowledge of attack types and access to labeled training data.
We propose new training losses to reduce useless features and the corresponding detection method without prior knowledge of adversarial attacks.
The proposed method works well in all tested attack types and the false positive rates are even better than the methods good at certain types.
- Score: 24.76524262635603
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Adversarial robustness poses a critical challenge in the deployment of deep
learning models for real-world applications. Traditional approaches to
adversarial training and supervised detection rely on prior knowledge of attack
types and access to labeled training data, which is often impractical. Existing
unsupervised adversarial detection methods identify whether the target model
works properly, but they suffer from bad accuracies owing to the use of common
cross-entropy training loss, which relies on unnecessary features and
strengthens adversarial attacks. We propose new training losses to reduce
useless features and the corresponding detection method without prior knowledge
of adversarial attacks. The detection rate (true positive rate) against all
given white-box attacks is above 93.9% except for attacks without limits
(DF($\infty$)), while the false positive rate is barely 2.5%. The proposed
method works well in all tested attack types and the false positive rates are
even better than the methods good at certain types.
Related papers
- SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - Fortify the Guardian, Not the Treasure: Resilient Adversarial Detectors [0.0]
An adaptive attack is one where the attacker is aware of the defenses and adapts their strategy accordingly.
Our proposed method leverages adversarial training to reinforce the ability to detect attacks, without compromising clean accuracy.
Experimental evaluations on the CIFAR-10 and SVHN datasets demonstrate that our proposed algorithm significantly improves a detector's ability to accurately identify adaptive adversarial attacks.
arXiv Detail & Related papers (2024-04-18T12:13:09Z) - Can We Trust the Unlabeled Target Data? Towards Backdoor Attack and Defense on Model Adaptation [120.42853706967188]
We explore the potential backdoor attacks on model adaptation launched by well-designed poisoning target data.
We propose a plug-and-play method named MixAdapt, combining it with existing adaptation algorithms.
arXiv Detail & Related papers (2024-01-11T16:42:10Z) - Adversarial Training Should Be Cast as a Non-Zero-Sum Game [121.95628660889628]
Two-player zero-sum paradigm of adversarial training has not engendered sufficient levels of robustness.
We show that the commonly used surrogate-based relaxation used in adversarial training algorithms voids all guarantees on robustness.
A novel non-zero-sum bilevel formulation of adversarial training yields a framework that matches and in some cases outperforms state-of-the-art attacks.
arXiv Detail & Related papers (2023-06-19T16:00:48Z) - Rethinking Backdoor Data Poisoning Attacks in the Context of
Semi-Supervised Learning [5.417264344115724]
Semi-supervised learning methods can train high-accuracy machine learning models with a fraction of the labeled training samples required for traditional supervised learning.
Such methods do not typically involve close review of the unlabeled training samples, making them tempting targets for data poisoning attacks.
We show that simple poisoning attacks that influence the distribution of the poisoned samples' predicted labels are highly effective.
arXiv Detail & Related papers (2022-12-05T20:21:31Z) - A False Sense of Security? Revisiting the State of Machine
Learning-Based Industrial Intrusion Detection [9.924435476552702]
Anomaly-based intrusion detection promises to detect novel or unknown attacks on industrial control systems.
Research focuses on machine learning to train them automatically, achieving detection rates upwards of 99%.
Our results highlight an ineffectiveness in detecting unknown attacks, with detection rates dropping to between 3.2% and 14.7%.
arXiv Detail & Related papers (2022-05-18T20:17:33Z) - DAD: Data-free Adversarial Defense at Test Time [21.741026088202126]
Deep models are highly susceptible to adversarial attacks.
Privacy has become an important concern, restricting access to only trained models but not the training data.
We propose a completely novel problem of 'test-time adversarial defense in absence of training data and even their statistics'
arXiv Detail & Related papers (2022-04-04T15:16:13Z) - Adversarial Robustness of Deep Reinforcement Learning based Dynamic
Recommender Systems [50.758281304737444]
We propose to explore adversarial examples and attack detection on reinforcement learning-based interactive recommendation systems.
We first craft different types of adversarial examples by adding perturbations to the input and intervening on the casual factors.
Then, we augment recommendation systems by detecting potential attacks with a deep learning-based classifier based on the crafted data.
arXiv Detail & Related papers (2021-12-02T04:12:24Z) - How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z) - Anomaly Detection-Based Unknown Face Presentation Attack Detection [74.4918294453537]
Anomaly detection-based spoof attack detection is a recent development in face Presentation Attack Detection.
In this paper, we present a deep-learning solution for anomaly detection-based spoof attack detection.
The proposed approach benefits from the representation learning power of the CNNs and learns better features for fPAD task.
arXiv Detail & Related papers (2020-07-11T21:20:55Z) - Adversarial Detection and Correction by Matching Prediction
Distributions [0.0]
The detector almost completely neutralises powerful attacks like Carlini-Wagner or SLIDE on MNIST and Fashion-MNIST.
We show that our method is still able to detect the adversarial examples in the case of a white-box attack where the attacker has full knowledge of both the model and the defence.
arXiv Detail & Related papers (2020-02-21T15:45:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.