Related papers: Revealing Vulnerabilities of Neural Networks in Parameter Learning and Defense Against Explanation-Aware Backdoors

Revealing Vulnerabilities of Neural Networks in Parameter Learning and Defense Against Explanation-Aware Backdoors

URL: http://arxiv.org/abs/2403.16569v1
Date: Mon, 25 Mar 2024 09:36:10 GMT
Title: Revealing Vulnerabilities of Neural Networks in Parameter Learning and Defense Against Explanation-Aware Backdoors
Authors: Md Abdul Kadir, GowthamKrishna Addluri, Daniel Sonntag,
Abstract summary: Blinding attacks can drastically alter a machine learning algorithm's prediction and explanation. We leverage statistical analysis to highlight the changes in CNN weights within a CNN following blinding attacks. We introduce a method specifically designed to limit the effectiveness of such attacks during the evaluation phase.
Score: 2.1165011830664673
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Explainable Artificial Intelligence (XAI) strategies play a crucial part in increasing the understanding and trustworthiness of neural networks. Nonetheless, these techniques could potentially generate misleading explanations. Blinding attacks can drastically alter a machine learning algorithm's prediction and explanation, providing misleading information by adding visually unnoticeable artifacts into the input, while maintaining the model's accuracy. It poses a serious challenge in ensuring the reliability of XAI methods. To ensure the reliability of XAI methods poses a real challenge, we leverage statistical analysis to highlight the changes in CNN weights within a CNN following blinding attacks. We introduce a method specifically designed to limit the effectiveness of such attacks during the evaluation phase, avoiding the need for extra training. The method we suggest defences against most modern explanation-aware adversarial attacks, achieving an approximate decrease of ~99\% in the Attack Success Rate (ASR) and a ~91\% reduction in the Mean Square Error (MSE) between the original explanation and the defended (post-attack) explanation across three unique types of attacks.

Related papers

eXIAA: eXplainable Injections for Adversarial Attack [3.512208543873998]
We show a new black-box model-agnostic adversarial attack for post-hoc explainable Artificial Intelligence (XAI)<n>The goal of the attack is to modify the original explanations while being undetected by the human eye and maintain the same predicted class.<n>The low requirements of our method expose a critical vulnerability in current explainability methods, raising concerns about their reliability in safety-critical applications.
arXiv Detail & Related papers (2025-11-13T08:42:24Z)
Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning. This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities. In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z)
Wasserstein distributional robustness of neural networks [9.79503506460041]
Deep neural networks are known to be vulnerable to adversarial attacks (AA) For an image recognition task, this means that a small perturbation of the original can result in the image being misclassified. We re-cast the problem using techniques of Wasserstein distributionally robust optimization (DRO) and obtain novel contributions.
arXiv Detail & Related papers (2023-06-16T13:41:24Z)
Learning to Learn Transferable Attack [77.67399621530052]
Transfer adversarial attack is a non-trivial black-box adversarial attack that aims to craft adversarial perturbations on the surrogate model and then apply such perturbations to the victim model. We propose a Learning to Learn Transferable Attack (LLTA) method, which makes the adversarial perturbations more generalized via learning from both data and model augmentation. Empirical results on the widely-used dataset demonstrate the effectiveness of our attack method with a 12.85% higher success rate of transfer attack compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-12-10T07:24:21Z)
Certifiers Make Neural Networks Vulnerable to Availability Attacks [70.69104148250614]
We show for the first time that fallback strategies can be deliberately triggered by an adversary. In addition to naturally occurring abstains for some inputs and perturbations, the adversary can use training-time attacks to deliberately trigger the fallback. We design two novel availability attacks, which show the practical relevance of these threats.
arXiv Detail & Related papers (2021-08-25T15:49:10Z)
Feature Importance Guided Attack: A Model Agnostic Adversarial Attack [0.0]
We present the 'Feature Importance Guided Attack' (FIGA) which generates adversarial evasion samples. We demonstrate FIGA against eight phishing detection models. We are able to cause a reduction in the F1-score of a phishing detection model from 0.96 to 0.41 on average.
arXiv Detail & Related papers (2021-06-28T15:46:22Z)
The Feasibility and Inevitability of Stealth Attacks [63.14766152741211]
We study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence systems. In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself.
arXiv Detail & Related papers (2021-06-26T10:50:07Z)
Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications. We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths. Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z)
Progressive Defense Against Adversarial Attacks for Deep Learning as a Service in Internet of Things [9.753864027359521]
Some Deep Neural Networks (DNN) can be easily misled by adding relatively small but adversarial perturbations to the input. We present a defense strategy called a progressive defense against adversarial attacks (PDAAA) for efficiently and effectively filtering out the adversarial pixel mutations. The result shows it outperforms the state-of-the-art while reducing the cost of model training by 50% on average.
arXiv Detail & Related papers (2020-10-15T06:40:53Z)
A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems. This paper proposes a self-supervised adversarial training mechanism in the input space. It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.