Unveiling Vulnerabilities in Interpretable Deep Learning Systems with
Query-Efficient Black-box Attacks
- URL: http://arxiv.org/abs/2307.11906v1
- Date: Fri, 21 Jul 2023 21:09:54 GMT
- Title: Unveiling Vulnerabilities in Interpretable Deep Learning Systems with
Query-Efficient Black-box Attacks
- Authors: Eldor Abdukhamidov, Mohammed Abuhamad, Simon S. Woo, Eric Chan-Tin,
Tamer Abuhmed
- Abstract summary: Interpretable Deep Learning Systems (IDLSes) are designed to make the system more transparent and explainable.
We propose a novel microbial genetic algorithm-based black-box attack against IDLSes that requires no prior knowledge of the target model and its interpretation model.
- Score: 16.13790238416691
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning has been rapidly employed in many applications revolutionizing
many industries, but it is known to be vulnerable to adversarial attacks. Such
attacks pose a serious threat to deep learning-based systems compromising their
integrity, reliability, and trust. Interpretable Deep Learning Systems (IDLSes)
are designed to make the system more transparent and explainable, but they are
also shown to be susceptible to attacks. In this work, we propose a novel
microbial genetic algorithm-based black-box attack against IDLSes that requires
no prior knowledge of the target model and its interpretation model. The
proposed attack is a query-efficient approach that combines transfer-based and
score-based methods, making it a powerful tool to unveil IDLS vulnerabilities.
Our experiments of the attack show high attack success rates using adversarial
examples with attribution maps that are highly similar to those of benign
samples which makes it difficult to detect even by human analysts. Our results
highlight the need for improved IDLS security to ensure their practical
reliability.
Related papers
- EaTVul: ChatGPT-based Evasion Attack Against Software Vulnerability Detection [19.885698402507145]
Adversarial examples can exploit vulnerabilities within deep neural networks.
This study showcases the susceptibility of deep learning models to adversarial attacks, which can achieve 100% attack success rate.
arXiv Detail & Related papers (2024-07-27T09:04:54Z) - Leveraging Reinforcement Learning in Red Teaming for Advanced Ransomware Attack Simulations [7.361316528368866]
This paper proposes a novel approach utilizing reinforcement learning (RL) to simulate ransomware attacks.
By training an RL agent in a simulated environment mirroring real-world networks, effective attack strategies can be learned quickly.
Experimental results on a 152-host example network confirm the effectiveness of the proposed approach.
arXiv Detail & Related papers (2024-06-25T14:16:40Z) - Learning diverse attacks on large language models for robust red-teaming and safety tuning [126.32539952157083]
Red-teaming, or identifying prompts that elicit harmful responses, is a critical step in ensuring the safe deployment of large language models.
We show that even with explicit regularization to favor novelty and diversity, existing approaches suffer from mode collapse or fail to generate effective attacks.
We propose to use GFlowNet fine-tuning, followed by a secondary smoothing phase, to train the attacker model to generate diverse and effective attack prompts.
arXiv Detail & Related papers (2024-05-28T19:16:17Z) - Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-based Decision-Making Systems [27.316115171846953]
Large Language Models (LLMs) have shown significant promise in real-world decision-making tasks for embodied AI.
LLMs are fine-tuned to leverage their inherent common sense and reasoning abilities while being tailored to specific applications.
This fine-tuning process introduces considerable safety and security vulnerabilities, especially in safety-critical cyber-physical systems.
arXiv Detail & Related papers (2024-05-27T17:59:43Z) - Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features.
backdoor attacks subtly embed malicious behaviors within the model during training.
We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z) - Benchmarking and Defending Against Indirect Prompt Injection Attacks on
Large Language Models [82.98081731588717]
Integration of large language models with external content exposes applications to indirect prompt injection attacks.
We introduce the first benchmark for indirect prompt injection attacks, named BIPIA, to evaluate the risk of such attacks.
We develop two black-box methods based on prompt learning and a white-box defense method based on fine-tuning with adversarial training.
arXiv Detail & Related papers (2023-12-21T01:08:39Z) - Untargeted White-box Adversarial Attack with Heuristic Defence Methods
in Real-time Deep Learning based Network Intrusion Detection System [0.0]
In Adversarial Machine Learning (AML), malicious actors aim to fool the Machine Learning (ML) and Deep Learning (DL) models to produce incorrect predictions.
AML is an emerging research domain, and it has become a necessity for the in-depth study of adversarial attacks.
We implement four powerful adversarial attack techniques, namely, Fast Gradient Sign Method (FGSM), Jacobian Saliency Map Attack (JSMA), Projected Gradient Descent (PGD) and Carlini & Wagner (C&W) in NIDS.
arXiv Detail & Related papers (2023-10-05T06:32:56Z) - Downlink Power Allocation in Massive MIMO via Deep Learning: Adversarial
Attacks and Training [62.77129284830945]
This paper considers a regression problem in a wireless setting and shows that adversarial attacks can break the DL-based approach.
We also analyze the effectiveness of adversarial training as a defensive technique in adversarial settings and show that the robustness of DL-based wireless system against attacks improves significantly.
arXiv Detail & Related papers (2022-06-14T04:55:11Z) - RobustSense: Defending Adversarial Attack for Secure Device-Free Human
Activity Recognition [37.387265457439476]
We propose a novel learning framework, RobustSense, to defend common adversarial attacks.
Our method works well on wireless human activity recognition and person identification systems.
arXiv Detail & Related papers (2022-04-04T15:06:03Z) - Adversarial defense for automatic speaker verification by cascaded
self-supervised learning models [101.42920161993455]
More and more malicious attackers attempt to launch adversarial attacks at automatic speaker verification (ASV) systems.
We propose a standard and attack-agnostic method based on cascaded self-supervised learning models to purify the adversarial perturbations.
Experimental results demonstrate that the proposed method achieves effective defense performance and can successfully counter adversarial attacks.
arXiv Detail & Related papers (2021-02-14T01:56:43Z) - Increasing the Confidence of Deep Neural Networks by Coverage Analysis [71.57324258813674]
This paper presents a lightweight monitoring architecture based on coverage paradigms to enhance the model against different unsafe inputs.
Experimental results show that the proposed approach is effective in detecting both powerful adversarial examples and out-of-distribution inputs.
arXiv Detail & Related papers (2021-01-28T16:38:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.