Analyzing Adversarial Inputs in Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2402.05284v1
- Date: Wed, 7 Feb 2024 21:58:40 GMT
- Title: Analyzing Adversarial Inputs in Deep Reinforcement Learning
- Authors: Davide Corsi, Guy Amir, Guy Katz, Alessandro Farinelli
- Abstract summary: We present a comprehensive analysis of the characterization of adversarial inputs, through the lens of formal verification.
We introduce a novel metric, the Adversarial Rate, to classify models based on their susceptibility to such perturbations.
Our analysis empirically demonstrates how adversarial inputs can affect the safety of a given DRL system with respect to such perturbations.
- Score: 53.3760591018817
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, Deep Reinforcement Learning (DRL) has become a popular
paradigm in machine learning due to its successful applications to real-world
and complex systems. However, even the state-of-the-art DRL models have been
shown to suffer from reliability concerns -- for example, their susceptibility
to adversarial inputs, i.e., small and abundant input perturbations that can
fool the models into making unpredictable and potentially dangerous decisions.
This drawback limits the deployment of DRL systems in safety-critical contexts,
where even a small error cannot be tolerated. In this work, we present a
comprehensive analysis of the characterization of adversarial inputs, through
the lens of formal verification. Specifically, we introduce a novel metric, the
Adversarial Rate, to classify models based on their susceptibility to such
perturbations, and present a set of tools and algorithms for its computation.
Our analysis empirically demonstrates how adversarial inputs can affect the
safety of a given DRL system with respect to such perturbations. Moreover, we
analyze the behavior of these configurations to suggest several useful
practices and guidelines to help mitigate the vulnerability of trained DRL
networks.
Related papers
- Advanced Persistent Threats (APT) Attribution Using Deep Reinforcement Learning [0.0]
This paper investigates the application of Deep Reinforcement Learning (DRL) for attributing malware to specific Advanced Persistent Threat (APT) groups.
By analysing over 3500 malware samples from 12 distinct APT groups, the study utilise sophisticated tools like Cuckoo to extract data.
The research that the DRL model significantly outperforms traditional machine learning approaches, achieving an impressive test accuracy of 89.27 %.
arXiv Detail & Related papers (2024-10-15T10:10:33Z) - Multi-agent Reinforcement Learning-based Network Intrusion Detection System [3.4636217357968904]
Intrusion Detection Systems (IDS) play a crucial role in ensuring the security of computer networks.
We propose a novel multi-agent reinforcement learning (RL) architecture, enabling automatic, efficient, and robust network intrusion detection.
Our solution introduces a resilient architecture designed to accommodate the addition of new attacks and effectively adapt to changes in existing attack patterns.
arXiv Detail & Related papers (2024-07-08T09:18:59Z) - Tolerance of Reinforcement Learning Controllers against Deviations in Cyber Physical Systems [8.869030580266799]
We introduce a new, expressive notion of tolerance that describes how well a controller is capable of satisfying a desired system requirement.
We propose a novel analysis problem, called the tolerance falsification problem, which involves finding small deviations that result in a violation of the given requirement.
We present a novel, two-layer simulation-based analysis framework and a novel search for finding small tolerance violations.
arXiv Detail & Related papers (2024-06-24T18:33:45Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks.
We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z) - Adversarial Machine Learning In Network Intrusion Detection Domain: A
Systematic Review [0.0]
It has been found that deep learning models are vulnerable to data instances that can mislead the model to make incorrect classification decisions.
This survey explores the researches that employ different aspects of adversarial machine learning in the area of network intrusion detection.
arXiv Detail & Related papers (2021-12-06T19:10:23Z) - A new interpretable unsupervised anomaly detection method based on
residual explanation [47.187609203210705]
We present RXP, a new interpretability method to deal with the limitations for AE-based AD in large-scale systems.
It stands out for its implementation simplicity, low computational cost and deterministic behavior.
In an experiment using data from a real heavy-haul railway line, the proposed method achieved superior performance compared to SHAP.
arXiv Detail & Related papers (2021-03-14T15:35:45Z) - Evaluating the Safety of Deep Reinforcement Learning Models using
Semi-Formal Verification [81.32981236437395]
We present a semi-formal verification approach for decision-making tasks based on interval analysis.
Our method obtains comparable results over standard benchmarks with respect to formal verifiers.
Our approach allows to efficiently evaluate safety properties for decision-making models in practical applications.
arXiv Detail & Related papers (2020-10-19T11:18:06Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.