Certifiers Make Neural Networks Vulnerable to Availability Attacks
- URL: http://arxiv.org/abs/2108.11299v5
- Date: Tue, 3 Oct 2023 13:08:50 GMT
- Title: Certifiers Make Neural Networks Vulnerable to Availability Attacks
- Authors: Tobias Lorenz, Marta Kwiatkowska, Mario Fritz
- Abstract summary: We show for the first time that fallback strategies can be deliberately triggered by an adversary.
In addition to naturally occurring abstains for some inputs and perturbations, the adversary can use training-time attacks to deliberately trigger the fallback.
We design two novel availability attacks, which show the practical relevance of these threats.
- Score: 70.69104148250614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To achieve reliable, robust, and safe AI systems, it is vital to implement
fallback strategies when AI predictions cannot be trusted. Certifiers for
neural networks are a reliable way to check the robustness of these
predictions. They guarantee for some predictions that a certain class of
manipulations or attacks could not have changed the outcome. For the remaining
predictions without guarantees, the method abstains from making a prediction,
and a fallback strategy needs to be invoked, which typically incurs additional
costs, can require a human operator, or even fail to provide any prediction.
While this is a key concept towards safe and secure AI, we show for the first
time that this approach comes with its own security risks, as such fallback
strategies can be deliberately triggered by an adversary. In addition to
naturally occurring abstains for some inputs and perturbations, the adversary
can use training-time attacks to deliberately trigger the fallback with high
probability. This transfers the main system load onto the fallback, reducing
the overall system's integrity and/or availability. We design two novel
availability attacks, which show the practical relevance of these threats. For
example, adding 1% poisoned data during training is sufficient to trigger the
fallback and hence make the model unavailable for up to 100% of all inputs by
inserting the trigger. Our extensive experiments across multiple datasets,
model architectures, and certifiers demonstrate the broad applicability of
these attacks. An initial investigation into potential defenses shows that
current approaches are insufficient to mitigate the issue, highlighting the
need for new, specific solutions.
Related papers
- FaultGuard: A Generative Approach to Resilient Fault Prediction in Smart Electrical Grids [53.2306792009435]
FaultGuard is the first framework for fault type and zone classification resilient to adversarial attacks.
We propose a low-complexity fault prediction model and an online adversarial training technique to enhance robustness.
Our model outclasses the state-of-the-art for resilient fault prediction benchmarking, with an accuracy of up to 0.958.
arXiv Detail & Related papers (2024-03-26T08:51:23Z) - Revealing Vulnerabilities of Neural Networks in Parameter Learning and Defense Against Explanation-Aware Backdoors [2.1165011830664673]
Blinding attacks can drastically alter a machine learning algorithm's prediction and explanation.
We leverage statistical analysis to highlight the changes in CNN weights within a CNN following blinding attacks.
We introduce a method specifically designed to limit the effectiveness of such attacks during the evaluation phase.
arXiv Detail & Related papers (2024-03-25T09:36:10Z) - Availability Adversarial Attack and Countermeasures for Deep
Learning-based Load Forecasting [1.4112444998191698]
Deep neural networks are prone to adversarial attacks.
This paper proposes availability-based adversarial attacks, which can be more easily implemented by attackers.
An adversarial training algorithm is shown to significantly improve robustness against availability attacks.
arXiv Detail & Related papers (2023-01-04T21:54:32Z) - RobustSense: Defending Adversarial Attack for Secure Device-Free Human
Activity Recognition [37.387265457439476]
We propose a novel learning framework, RobustSense, to defend common adversarial attacks.
Our method works well on wireless human activity recognition and person identification systems.
arXiv Detail & Related papers (2022-04-04T15:06:03Z) - On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern.
In general, adversarial training is believed to defend against backdoor attacks.
We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z) - The Feasibility and Inevitability of Stealth Attacks [63.14766152741211]
We study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence systems.
In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself.
arXiv Detail & Related papers (2021-06-26T10:50:07Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems.
This paper proposes a self-supervised adversarial training mechanism in the input space.
It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.