Evolving Reinforcement Learning Environment to Minimize Learner's
Achievable Reward: An Application on Hardening Active Directory Systems
- URL: http://arxiv.org/abs/2304.03998v1
- Date: Sat, 8 Apr 2023 12:39:40 GMT
- Title: Evolving Reinforcement Learning Environment to Minimize Learner's
Achievable Reward: An Application on Hardening Active Directory Systems
- Authors: Diksha Goel, Aneta Neumann, Frank Neumann, Hung Nguyen, Mingyu Guo
- Abstract summary: We apply Evolutionary Diversity Optimization to generate diverse population of environments for training.
We demonstrate the effectiveness of our approach by focusing on a specific application, Active Directory.
- Score: 15.36968083280611
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a Stackelberg game between one attacker and one defender in a
configurable environment. The defender picks a specific environment
configuration. The attacker observes the configuration and attacks via
Reinforcement Learning (RL trained against the observed environment). The
defender's goal is to find the environment with minimum achievable reward for
the attacker. We apply Evolutionary Diversity Optimization (EDO) to generate
diverse population of environments for training. Environments with clearly high
rewards are killed off and replaced by new offsprings to avoid wasting training
time. Diversity not only improves training quality but also fits well with our
RL scenario: RL agents tend to improve gradually, so a slightly worse
environment earlier on may become better later. We demonstrate the
effectiveness of our approach by focusing on a specific application, Active
Directory (AD). AD is the default security management system for Windows domain
networks. AD environment describes an attack graph, where nodes represent
computers/accounts/etc., and edges represent accesses. The attacker aims to
find the best attack path to reach the highest-privilege node. The defender can
change the graph by removing a limited number of edges (revoke accesses). Our
approach generates better defensive plans than the existing approach and scales
better.
Related papers
- Optimizing Cyber Defense in Dynamic Active Directories through Reinforcement Learning [10.601458163651582]
This paper addresses the absence of effective edge-blocking ACO strategies in dynamic, real-world networks.
It specifically targets the cybersecurity vulnerabilities of organizational Active Directory (AD) systems.
Unlike the existing literature on edge-blocking defenses which considers AD systems as static entities, our study counters this by recognizing their dynamic nature.
arXiv Detail & Related papers (2024-06-28T01:37:46Z) - Learning diverse attacks on large language models for robust red-teaming and safety tuning [126.32539952157083]
Red-teaming, or identifying prompts that elicit harmful responses, is a critical step in ensuring the safe deployment of large language models.
We show that even with explicit regularization to favor novelty and diversity, existing approaches suffer from mode collapse or fail to generate effective attacks.
We propose to use GFlowNet fine-tuning, followed by a secondary smoothing phase, to train the attacker model to generate diverse and effective attack prompts.
arXiv Detail & Related papers (2024-05-28T19:16:17Z) - Optimal Attack and Defense for Reinforcement Learning [11.36770403327493]
In adversarial RL, an external attacker has the power to manipulate the victim agent's interaction with the environment.
We show the attacker's problem of designing a stealthy attack that maximizes its own expected reward.
We argue that the optimal defense policy for the victim can be computed as the solution to a Stackelberg game.
arXiv Detail & Related papers (2023-11-30T21:21:47Z) - Baseline Defenses for Adversarial Attacks Against Aligned Language
Models [109.75753454188705]
Recent work shows that text moderations can produce jailbreaking prompts that bypass defenses.
We look at three types of defenses: detection (perplexity based), input preprocessing (paraphrase and retokenization), and adversarial training.
We find that the weakness of existing discretes for text, combined with the relatively high costs of optimization, makes standard adaptive attacks more challenging for LLMs.
arXiv Detail & Related papers (2023-09-01T17:59:44Z) - Guidance Through Surrogate: Towards a Generic Diagnostic Attack [101.36906370355435]
We develop a guided mechanism to avoid local minima during attack optimization, leading to a novel attack dubbed Guided Projected Gradient Attack (G-PGA)
Our modified attack does not require random restarts, large number of attack iterations or search for an optimal step-size.
More than an effective attack, G-PGA can be used as a diagnostic tool to reveal elusive robustness due to gradient masking in adversarial defenses.
arXiv Detail & Related papers (2022-12-30T18:45:23Z) - Defending Active Directory by Combining Neural Network based Dynamic
Program and Evolutionary Diversity Optimisation [14.326083603965278]
We study a Stackelberg game model between one attacker and one defender on an AD attack graph.
The attacker aims to maximize their chance of successfully reaching the destination before getting detected.
The defender's task is to block a constant number of edges to decrease the attacker's chance of success.
arXiv Detail & Related papers (2022-04-07T12:36:11Z) - LAS-AT: Adversarial Training with Learnable Attack Strategy [82.88724890186094]
"Learnable attack strategy", dubbed LAS-AT, learns to automatically produce attack strategies to improve the model robustness.
Our framework is composed of a target network that uses AEs for training to improve robustness and a strategy network that produces attack strategies to control the AE generation.
arXiv Detail & Related papers (2022-03-13T10:21:26Z) - Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the
Age of AI-NIDS [70.60975663021952]
We study blackbox adversarial attacks on network classifiers.
We argue that attacker-defender fixed points are themselves general-sum games with complex phase transitions.
We show that a continual learning approach is required to study attacker-defender dynamics.
arXiv Detail & Related papers (2021-11-23T23:42:16Z) - Emergent Complexity and Zero-shot Transfer via Unsupervised Environment
Design [121.73425076217471]
We propose Unsupervised Environment Design (UED), where developers provide environments with unknown parameters, and these parameters are used to automatically produce a distribution over valid, solvable environments.
We call our technique Protagonist Antagonist Induced Regret Environment Design (PAIRED)
Our experiments demonstrate that PAIRED produces a natural curriculum of increasingly complex environments, and PAIRED agents achieve higher zero-shot transfer performance when tested in highly novel environments.
arXiv Detail & Related papers (2020-12-03T17:37:01Z) - Policy Teaching in Reinforcement Learning via Environment Poisoning
Attacks [33.41280432984183]
We study a security threat to reinforcement learning where an attacker poisons the learning environment to force the agent into executing a target policy chosen by the attacker.
As a victim, we consider RL agents whose objective is to find a policy that maximizes reward in infinite-horizon problem settings.
arXiv Detail & Related papers (2020-11-21T16:54:45Z) - Policy Teaching via Environment Poisoning: Training-time Adversarial
Attacks against Reinforcement Learning [33.41280432984183]
We study a security threat to reinforcement learning where an attacker poisons the learning environment to force the agent into executing a target policy.
As a victim, we consider RL agents whose objective is to find a policy that maximizes average reward in undiscounted infinite-horizon problem settings.
arXiv Detail & Related papers (2020-03-28T23:22:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.