Knowledge-Informed Auto-Penetration Testing Based on Reinforcement Learning with Reward Machine
- URL: http://arxiv.org/abs/2405.15908v1
- Date: Fri, 24 May 2024 20:05:12 GMT
- Title: Knowledge-Informed Auto-Penetration Testing Based on Reinforcement Learning with Reward Machine
- Authors: Yuanliang Li, Hanzheng Dai, Jun Yan,
- Abstract summary: We propose a knowledge-informed AutoPT framework called DRLRM-PT.
We use reward machines (RMs) to encode domain knowledge as guidelines for training a PT policy.
We show that RMs encoding more detailed domain knowledge demonstrated better PT performance compared to RMs with simpler knowledge.
- Score: 2.087814874079289
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated penetration testing (AutoPT) based on reinforcement learning (RL) has proven its ability to improve the efficiency of vulnerability identification in information systems. However, RL-based PT encounters several challenges, including poor sampling efficiency, intricate reward specification, and limited interpretability. To address these issues, we propose a knowledge-informed AutoPT framework called DRLRM-PT, which leverages reward machines (RMs) to encode domain knowledge as guidelines for training a PT policy. In our study, we specifically focus on lateral movement as a PT case study and formulate it as a partially observable Markov decision process (POMDP) guided by RMs. We design two RMs based on the MITRE ATT\&CK knowledge base for lateral movement. To solve the POMDP and optimize the PT policy, we employ the deep Q-learning algorithm with RM (DQRM). The experimental results demonstrate that the DQRM agent exhibits higher training efficiency in PT compared to agents without knowledge embedding. Moreover, RMs encoding more detailed domain knowledge demonstrated better PT performance compared to RMs with simpler knowledge.
Related papers
- Self-Regulation and Requesting Interventions [63.5863047447313]
We propose an offline framework that trains a "helper" policy to request interventions.
We score optimal intervention timing with PRMs and train the helper model on these labeled trajectories.
This offline approach significantly reduces costly intervention calls during training.
arXiv Detail & Related papers (2025-02-07T00:06:17Z) - ReARTeR: Retrieval-Augmented Reasoning with Trustworthy Process Rewarding [25.329712997545794]
We propose Retrieval-Augmented Reasoning through Trustworthy Process Rewarding (ReARTeR)
ReARTeR enhances RAG systems' reasoning capabilities through post-training and test-time scaling.
Experimental results on multi-step reasoning benchmarks demonstrate significant improvements.
arXiv Detail & Related papers (2025-01-14T05:56:26Z) - Reward Machine Inference for Robotic Manipulation [1.6135226672466307]
Reward Machines (RMs) enhance RL's capability to train policies over extended time horizons.
We introduce a novel LfD approach for learning RMs directly from visual demonstrations of robotic manipulation tasks.
We validate our method on vision-based manipulation tasks, showing that the inferred RM accurately captures task structure and enables an RL agent to effectively learn an optimal policy.
arXiv Detail & Related papers (2024-12-13T12:32:53Z) - Free Process Rewards without Process Labels [55.14044050782222]
We show that an textitimplicit PRM can be obtained at no additional cost, by simply training an ORM on the cheaper response-level labels.
We show that our implicit PRM, when instantiated with the cross-entropy (CE) loss, is more data-efficient and can keep improving generation models even when trained with only one response per instruction.
arXiv Detail & Related papers (2024-12-02T21:20:02Z) - Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning [90.23629291067763]
A promising approach for improving reasoning in large language models is to use process reward models (PRMs)
PRMs provide feedback at each step of a multi-step reasoning trace, potentially improving credit assignment over outcome reward models (ORMs)
To improve a base policy by running search against a PRM or using it as dense rewards for reinforcement learning (RL), we ask: "How should we design process rewards?"
We theoretically characterize the set of good provers and our results show that optimizing process rewards from such provers improves exploration during test-time search and online RL.
arXiv Detail & Related papers (2024-10-10T17:31:23Z) - RRM: Robust Reward Model Training Mitigates Reward Hacking [51.12341734942797]
Reward models (RMs) play a pivotal role in aligning large language models with human preferences.
We introduce a causal framework that learns preferences independent of these artifacts.
Experiments show that our approach successfully filters out undesirable artifacts, yielding a more robust reward model.
arXiv Detail & Related papers (2024-09-20T01:46:07Z) - Learning Robust Reward Machines from Noisy Labels [46.18428376996514]
PROB-IRM is an approach that learns robust reward machines (RMs) for reinforcement learning (RL) agents from noisy execution traces.
We show that PROB-IRM can learn (potentially imperfect) RMs from noisy traces and exploit them to train an RL agent to solve its tasks successfully.
arXiv Detail & Related papers (2024-08-27T08:41:42Z) - Prior Constraints-based Reward Model Training for Aligning Large Language Models [58.33118716810208]
This paper proposes a Prior Constraints-based Reward Model (namely PCRM) training method to mitigate this problem.
PCRM incorporates prior constraints, specifically, length ratio and cosine similarity between outputs of each comparison pair, during reward model training to regulate optimization magnitude and control score margins.
Experimental results demonstrate that PCRM significantly improves alignment performance by effectively constraining reward score scaling.
arXiv Detail & Related papers (2024-04-01T07:49:11Z) - Let's reward step by step: Step-Level reward model as the Navigators for
Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase.
We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs.
To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z) - Reinforced Deep Markov Models With Applications in Automatic Trading [0.0]
We propose a model-based RL approach, coined Reinforced Deep Markov Model (RDMM)
RDMM integrates desirable properties of a reinforcement learning algorithm acting as an automatic trading system.
Tests show that the RDMM is data-efficient and provides financial gains compared to the benchmarks in the optimal execution problem.
arXiv Detail & Related papers (2020-11-09T12:46:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.