Learning Intrusion Prevention Policies through Optimal Stopping
- URL: http://arxiv.org/abs/2106.07160v1
- Date: Mon, 14 Jun 2021 04:45:37 GMT
- Title: Learning Intrusion Prevention Policies through Optimal Stopping
- Authors: Kim Hammar and Rolf Stadler
- Abstract summary: We study automated intrusion prevention using reinforcement learning.
We formulate the problem of intrusion prevention as an optimal stopping problem.
This formulation allows us insight into the structure of the optimal policies, which turn out to be threshold based.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We study automated intrusion prevention using reinforcement learning. In a
novel approach, we formulate the problem of intrusion prevention as an optimal
stopping problem. This formulation allows us insight into the structure of the
optimal policies, which turn out to be threshold based. Since the computation
of the optimal defender policy using dynamic programming is not feasible for
practical cases, we approximate the optimal policy through reinforcement
learning in a simulation environment. To define the dynamics of the simulation,
we emulate the target infrastructure and collect measurements. Our evaluations
show that the learned policies are close to optimal and that they indeed can be
expressed using thresholds.
Related papers
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Acceleration in Policy Optimization [50.323182853069184]
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates.
We define optimism as predictive modelling of the future behavior of a policy, and adaptivity as taking immediate and anticipatory corrective actions to mitigate errors from overshooting predictions or delayed responses to change.
We design an optimistic policy gradient algorithm, adaptive via meta-gradient learning, and empirically highlight several design choices pertaining to acceleration, in an illustrative task.
arXiv Detail & Related papers (2023-06-18T15:50:57Z) - Value Enhancement of Reinforcement Learning via Efficient and Robust
Trust Region Optimization [14.028916306297928]
Reinforcement learning (RL) is a powerful machine learning technique that enables an intelligent agent to learn an optimal policy.
We propose a novel value enhancement method to improve the performance of a given initial policy computed by existing state-of-the-art RL algorithms.
arXiv Detail & Related papers (2023-01-05T18:43:40Z) - Bounded Robustness in Reinforcement Learning via Lexicographic
Objectives [54.00072722686121]
Policy robustness in Reinforcement Learning may not be desirable at any cost.
We study how policies can be maximally robust to arbitrary observational noise.
We propose a robustness-inducing scheme, applicable to any policy algorithm, that trades off expected policy utility for robustness.
arXiv Detail & Related papers (2022-09-30T08:53:18Z) - A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment.
We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z) - Intrusion Prevention through Optimal Stopping [0.0]
We study automated intrusion prevention using reinforcement learning.
We show that our approach can produce effective defender policies for a practical IT infrastructure of limited size.
arXiv Detail & Related papers (2021-10-30T17:03:28Z) - MPC-based Reinforcement Learning for Economic Problems with Application
to Battery Storage [0.0]
We focus on policy approximations based on Model Predictive Control (MPC)
We observe that the policy gradient method can struggle to produce meaningful steps in the policy parameters when the policy has a (nearly) bang-bang structure.
We propose a homotopy strategy based on the interior-point method, providing a relaxation of the policy during the learning.
arXiv Detail & Related papers (2021-04-06T10:37:14Z) - Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy.
We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z) - First Order Constrained Optimization in Policy Space [19.00289722198614]
We propose a novel approach called First Order Constrained Optimization in Policy Space (FOCOPS)
FOCOPS maximizes an agent's overall reward while ensuring the agent satisfies a set of cost constraints.
We provide empirical evidence that our simple approach achieves better performance on a set of constrained robotics locomotive tasks.
arXiv Detail & Related papers (2020-02-16T05:07:17Z) - Preventing Imitation Learning with Adversarial Policy Ensembles [79.81807680370677]
Imitation learning can reproduce policies by observing experts, which poses a problem regarding policy privacy.
How can we protect against external observers cloning our proprietary policies?
We introduce a new reinforcement learning framework, where we train an ensemble of near-optimal policies.
arXiv Detail & Related papers (2020-01-31T01:57:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.