Intrusion Prevention through Optimal Stopping
- URL: http://arxiv.org/abs/2111.00289v1
- Date: Sat, 30 Oct 2021 17:03:28 GMT
- Title: Intrusion Prevention through Optimal Stopping
- Authors: Kim Hammar and Rolf Stadler
- Abstract summary: We study automated intrusion prevention using reinforcement learning.
We show that our approach can produce effective defender policies for a practical IT infrastructure of limited size.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We study automated intrusion prevention using reinforcement learning.
Following a novel approach, we formulate the problem of intrusion prevention as
an (optimal) multiple stopping problem. This formulation gives us insight into
the structure of optimal policies, which we show to have threshold properties.
For most practical cases, it is not feasible to obtain an optimal defender
policy using dynamic programming. We therefore develop a reinforcement learning
approach to approximate an optimal policy. Our method for learning and
validating policies includes two systems: a simulation system where defender
policies are incrementally learned and an emulation system where statistics are
produced that drive simulation runs and where learned policies are evaluated.
We show that our approach can produce effective defender policies for a
practical IT infrastructure of limited size. Inspection of the learned policies
confirms that they exhibit threshold properties.
Related papers
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Statistically Efficient Variance Reduction with Double Policy Estimation
for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning [53.97273491846883]
We propose DPE: an RL algorithm that blends offline sequence modeling and offline reinforcement learning with Double Policy Estimation.
We validate our method in multiple tasks of OpenAI Gym with D4RL benchmarks.
arXiv Detail & Related papers (2023-08-28T20:46:07Z) - Enabling Efficient, Reliable Real-World Reinforcement Learning with
Approximate Physics-Based Models [10.472792899267365]
We focus on developing efficient and reliable policy optimization strategies for robot learning with real-world data.
In this paper we introduce a novel policy gradient-based policy optimization framework.
We show that our approach can learn precise control strategies reliably and with only minutes of real-world data.
arXiv Detail & Related papers (2023-07-16T22:36:36Z) - Value Enhancement of Reinforcement Learning via Efficient and Robust
Trust Region Optimization [14.028916306297928]
Reinforcement learning (RL) is a powerful machine learning technique that enables an intelligent agent to learn an optimal policy.
We propose a novel value enhancement method to improve the performance of a given initial policy computed by existing state-of-the-art RL algorithms.
arXiv Detail & Related papers (2023-01-05T18:43:40Z) - Offline Reinforcement Learning with Closed-Form Policy Improvement
Operators [88.54210578912554]
Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning.
In this paper, we propose our closed-form policy improvement operators.
We empirically demonstrate their effectiveness over state-of-the-art algorithms on the standard D4RL benchmark.
arXiv Detail & Related papers (2022-11-29T06:29:26Z) - Learning Security Strategies through Game Play and Optimal Stopping [0.0]
We study automated intrusion prevention using reinforcement learning.
We formulate the interaction between an attacker and a defender as an optimal stopping game.
To obtain the optimal defender strategies, we introduce T-FP, a fictitious self-play algorithm.
arXiv Detail & Related papers (2022-05-29T15:30:00Z) - Attacking and Defending Deep Reinforcement Learning Policies [3.6985039575807246]
We study robustness of DRL policies to adversarial attacks from the perspective of robust optimization.
We propose a greedy attack algorithm, which tries to minimize the expected return of the policy without interacting with the environment, and a defense algorithm, which performs adversarial training in a max-min form.
arXiv Detail & Related papers (2022-05-16T12:47:54Z) - A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment.
We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z) - Off-policy Reinforcement Learning with Optimistic Exploration and
Distribution Correction [73.77593805292194]
We train a separate exploration policy to maximize an approximate upper confidence bound of the critics in an off-policy actor-critic framework.
To mitigate the off-policy-ness, we adapt the recently introduced DICE framework to learn a distribution correction ratio for off-policy actor-critic training.
arXiv Detail & Related papers (2021-10-22T22:07:51Z) - Learning Intrusion Prevention Policies through Optimal Stopping [0.0]
We study automated intrusion prevention using reinforcement learning.
We formulate the problem of intrusion prevention as an optimal stopping problem.
This formulation allows us insight into the structure of the optimal policies, which turn out to be threshold based.
arXiv Detail & Related papers (2021-06-14T04:45:37Z) - Preventing Imitation Learning with Adversarial Policy Ensembles [79.81807680370677]
Imitation learning can reproduce policies by observing experts, which poses a problem regarding policy privacy.
How can we protect against external observers cloning our proprietary policies?
We introduce a new reinforcement learning framework, where we train an ensemble of near-optimal policies.
arXiv Detail & Related papers (2020-01-31T01:57:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.