Related papers: Learning from the Good Ones: Risk Profiling-Based Defenses Against Evasion Attacks on DNNs

Learning from the Good Ones: Risk Profiling-Based Defenses Against Evasion Attacks on DNNs

URL: http://arxiv.org/abs/2505.06477v1
Date: Sat, 10 May 2025 00:33:15 GMT
Title: Learning from the Good Ones: Risk Profiling-Based Defenses Against Evasion Attacks on DNNs
Authors: Mohammed Elnawawy, Gargi Mitra, Shahrear Iqbal, Karthik Pattabiraman,
Abstract summary: Safety-critical applications use deep neural networks (DNNs) to make predictions and infer decisions.<n>We propose a novel risk profiling framework that uses a risk-aware strategy to selectively train static defenses.<n>We show that selective training on the less vulnerable patients achieves a recall increase of up to 27.5% with minimal impact on precision.
Score: 4.837320865223376
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Safety-critical applications such as healthcare and autonomous vehicles use deep neural networks (DNN) to make predictions and infer decisions. DNNs are susceptible to evasion attacks, where an adversary crafts a malicious data instance to trick the DNN into making wrong decisions at inference time. Existing defenses that protect DNNs against evasion attacks are either static or dynamic. Static defenses are computationally efficient but do not adapt to the evolving threat landscape, while dynamic defenses are adaptable but suffer from an increased computational overhead. To combine the best of both worlds, in this paper, we propose a novel risk profiling framework that uses a risk-aware strategy to selectively train static defenses using victim instances that exhibit the most resilient features and are hence more resilient against an evasion attack. We hypothesize that training existing defenses on instances that are less vulnerable to the attack enhances the adversarial detection rate by reducing false negatives. We evaluate the efficacy of our risk-aware selective training strategy on a blood glucose management system that demonstrates how training static anomaly detectors indiscriminately may result in an increased false negative rate, which could be life-threatening in safety-critical applications. Our experiments show that selective training on the less vulnerable patients achieves a recall increase of up to 27.5\% with minimal impact on precision compared to indiscriminate training.

Related papers

Adversarial Robustness Unhardening via Backdoor Attacks in Federated Learning [12.232863656375098]
Federated learning enables the training of collaborative models without sharing of data.<n>This approach brings forth security challenges, notably poisoning and backdoor attacks.<n>We introduce Adversarial Robustness Unhardening (ARU), which is employed by a subset of adversarial clients.
arXiv Detail & Related papers (2023-10-17T21:38:41Z)
Avoid Adversarial Adaption in Federated Learning by Multi-Metric Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources. FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks. We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously. MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z)
Denoising Autoencoder-based Defensive Distillation as an Adversarial Robustness Algorithm [0.0]
Adversarial attacks significantly threaten the robustness of deep neural networks (DNNs) This work proposes a novel method that combines the defensive distillation mechanism with a denoising autoencoder (DAE)
arXiv Detail & Related papers (2023-03-28T11:34:54Z)
An Incremental Gray-box Physical Adversarial Attack on Neural Network Training [36.244907785240876]
We propose a gradient-free, gray box, incremental attack that targets the training process of neural networks. The proposed attack acquires its high-risk property from attacking data structures that are typically unobserved by professionals.
arXiv Detail & Related papers (2023-02-20T09:48:11Z)
Can Adversarial Training Be Manipulated By Non-Robust Features? [64.73107315313251]
Adversarial training, originally designed to resist test-time adversarial examples, has shown to be promising in mitigating training-time availability attacks. We identify a novel threat model named stability attacks, which aims to hinder robust availability by slightly perturbing the training data. Under this threat, we find that adversarial training using a conventional defense budget $epsilon$ provably fails to provide test robustness in a simple statistical setting.
arXiv Detail & Related papers (2022-01-31T16:25:25Z)
Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the Age of AI-NIDS [70.60975663021952]
We study blackbox adversarial attacks on network classifiers. We argue that attacker-defender fixed points are themselves general-sum games with complex phase transitions. We show that a continual learning approach is required to study attacker-defender dynamics.
arXiv Detail & Related papers (2021-11-23T23:42:16Z)
Towards Evaluating the Robustness of Neural Networks Learned by Transduction [44.189248766285345]
Greedy Model Space Attack (GMSA) is an attack framework that can serve as a new baseline for evaluating transductive-learning based defenses. We show that GMSA, even with weak instantiations, can break previous transductive-learning based defenses.
arXiv Detail & Related papers (2021-10-27T19:39:50Z)
Mitigating Gradient-based Adversarial Attacks via Denoising and Compression [7.305019142196582]
Gradient-based adversarial attacks on deep neural networks pose a serious threat. They can be deployed by adding imperceptible perturbations to the test data of any network. Denoising and dimensionality reduction are two distinct methods that have been investigated to combat such attacks.
arXiv Detail & Related papers (2021-04-03T22:57:01Z)
What Doesn't Kill You Makes You Robust(er): Adversarial Training against Poisons and Backdoors [57.040948169155925]
We extend the adversarial training framework to defend against (training-time) poisoning and backdoor attacks. Our method desensitizes networks to the effects of poisoning by creating poisons during training and injecting them into training batches. We show that this defense withstands adaptive attacks, generalizes to diverse threat models, and incurs a better performance trade-off than previous defenses.
arXiv Detail & Related papers (2021-02-26T17:54:36Z)
Adversarial defense for automatic speaker verification by cascaded self-supervised learning models [101.42920161993455]
More and more malicious attackers attempt to launch adversarial attacks at automatic speaker verification (ASV) systems. We propose a standard and attack-agnostic method based on cascaded self-supervised learning models to purify the adversarial perturbations. Experimental results demonstrate that the proposed method achieves effective defense performance and can successfully counter adversarial attacks.
arXiv Detail & Related papers (2021-02-14T01:56:43Z)
Guided Adversarial Attack for Evaluating and Enhancing Adversarial Defenses [59.58128343334556]
We introduce a relaxation term to the standard loss, that finds more suitable gradient-directions, increases attack efficacy and leads to more efficient adversarial training. We propose Guided Adversarial Margin Attack (GAMA), which utilizes function mapping of the clean image to guide the generation of adversaries. We also propose Guided Adversarial Training (GAT), which achieves state-of-the-art performance amongst single-step defenses.
arXiv Detail & Related papers (2020-11-30T16:39:39Z)
A Data Augmentation-based Defense Method Against Adversarial Attacks in Neural Networks [7.943024117353317]
We develop a lightweight defense method that can efficiently invalidate full whitebox adversarial attacks with the compatibility of real-life constraints. Our model can withstand advanced adaptive attack, namely BPDA with 50 rounds, and still helps the target model maintain an accuracy around 80 %, meanwhile constraining the attack success rate to almost zero.
arXiv Detail & Related papers (2020-07-30T08:06:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.