Out of the Cage: How Stochastic Parrots Win in Cyber Security
Environments
- URL: http://arxiv.org/abs/2308.12086v2
- Date: Mon, 28 Aug 2023 09:42:59 GMT
- Title: Out of the Cage: How Stochastic Parrots Win in Cyber Security
Environments
- Authors: Maria Rigaki, Ond\v{r}ej Luk\'a\v{s}, Carlos A. Catania, Sebastian
Garcia
- Abstract summary: Large Language Models (LLMs) have gained widespread popularity across diverse domains.
This paper introduces a novel application of pre-trained LLMs as agents within cybersecurity network environments.
We present an approach wherein pre-trained LLMs are leveraged as attacking agents in two reinforcement learning environments.
- Score: 0.5735035463793008
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large Language Models (LLMs) have gained widespread popularity across diverse
domains involving text generation, summarization, and various natural language
processing tasks. Despite their inherent limitations, LLM-based designs have
shown promising capabilities in planning and navigating open-world scenarios.
This paper introduces a novel application of pre-trained LLMs as agents within
cybersecurity network environments, focusing on their utility for sequential
decision-making processes.
We present an approach wherein pre-trained LLMs are leveraged as attacking
agents in two reinforcement learning environments. Our proposed agents
demonstrate similar or better performance against state-of-the-art agents
trained for thousands of episodes in most scenarios and configurations. In
addition, the best LLM agents perform similarly to human testers of the
environment without any additional training process. This design highlights the
potential of LLMs to efficiently address complex decision-making tasks within
cybersecurity.
Furthermore, we introduce a new network security environment named
NetSecGame. The environment is designed to eventually support complex
multi-agent scenarios within the network security domain. The proposed
environment mimics real network attacks and is designed to be highly modular
and adaptable for various scenarios.
Related papers
- MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation [52.739500459903724]
Large Language Models (LLMs) have demonstrated remarkable planning abilities across various domains, including robotics manipulation and navigation.
We propose a novel multi-agent LLM framework that distributes high-level planning and low-level control code generation across specialized LLM agents.
We evaluate our approach on nine RLBench tasks, including long-horizon tasks, and demonstrate its ability to solve robotics manipulation in a zero-shot setting.
arXiv Detail & Related papers (2024-11-26T17:53:44Z) - DynaSaur: Large Language Agents Beyond Predefined Actions [108.75187263724838]
Existing LLM agent systems typically select actions from a fixed and predefined set at every step.
We propose an LLM agent framework that enables the dynamic creation and composition of actions in an online manner.
Our experiments on the GAIA benchmark demonstrate that this framework offers significantly greater flexibility and outperforms previous methods.
arXiv Detail & Related papers (2024-11-04T02:08:59Z) - Entity-based Reinforcement Learning for Autonomous Cyber Defence [0.22499166814992438]
Key challenge for autonomous cyber defence is ensuring a defensive agent's ability to generalise across diverse network topologies and configurations.
Standard approaches to deep reinforcement learning expect fixed-size observation and action spaces.
In autonomous cyber defence, this makes it hard to develop agents that generalise to environments with network topologies different from those trained on.
arXiv Detail & Related papers (2024-10-23T08:04:12Z) - Hierarchical Multi-agent Reinforcement Learning for Cyber Network Defense [7.967738380932909]
We propose a hierarchical Proximal Policy Optimization (PPO) architecture that decomposes the cyber defense task into specific sub-tasks like network investigation and host recovery.
Our approach involves training sub-policies for each sub-task using PPO enhanced with domain expertise.
These sub-policies are then leveraged by a master defense policy that coordinates their selection to solve complex network defense tasks.
arXiv Detail & Related papers (2024-10-22T18:35:05Z) - Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments [0.5735035463793008]
Large Language Models (LLMs) have shown remarkable potential across various domains, including cybersecurity.
We present Hackphyr, a locally fine-tuned LLM to be used as a red-team agent within network security environments.
arXiv Detail & Related papers (2024-09-17T15:28:25Z) - Large Language Models for Base Station Siting: Intelligent Deployment based on Prompt or Agent [62.16747639440893]
Large language models (LLMs) and their associated technologies advance, particularly in the realms of prompt engineering and agent engineering.
This approach entails the strategic use of well-crafted prompts to infuse human experience and knowledge into these sophisticated LLMs.
This integration represents the future paradigm of artificial intelligence (AI) as a service and AI for more ease.
arXiv Detail & Related papers (2024-08-07T08:43:32Z) - Compromising Embodied Agents with Contextual Backdoor Attacks [69.71630408822767]
Large language models (LLMs) have transformed the development of embodied intelligence.
This paper uncovers a significant backdoor security threat within this process.
By poisoning just a few contextual demonstrations, attackers can covertly compromise the contextual environment of a black-box LLM.
arXiv Detail & Related papers (2024-08-06T01:20:12Z) - Large Language Models for Cyber Security: A Systematic Literature Review [14.924782327303765]
We conduct a comprehensive review of the literature on the application of Large Language Models in cybersecurity (LLM4Security)
We observe that LLMs are being applied to a wide range of cybersecurity tasks, including vulnerability detection, malware analysis, network intrusion detection, and phishing detection.
Third, we identify several promising techniques for adapting LLMs to specific cybersecurity domains, such as fine-tuning, transfer learning, and domain-specific pre-training.
arXiv Detail & Related papers (2024-05-08T02:09:17Z) - LLMArena: Assessing Capabilities of Large Language Models in Dynamic
Multi-Agent Environments [35.926581910260076]
We introduce LLMArena, a framework for evaluating the capabilities of large language models in multi-agent dynamic environments.
LLArena employs Trueskill scoring to assess crucial abilities in LLM agents, including spatial reasoning, strategic planning, numerical reasoning, risk assessment, communication, opponent modeling, and team collaboration.
We conduct an extensive experiment and human evaluation among different sizes and types of LLMs, showing that LLMs still have a significant journey ahead in their development towards becoming fully autonomous agents.
arXiv Detail & Related papers (2024-02-26T11:31:48Z) - AgentBench: Evaluating LLMs as Agents [88.45506148281379]
Large Language Models (LLMs) are becoming increasingly smart and autonomous, targeting real-world pragmatic missions beyond traditional NLP tasks.
We present AgentBench, a benchmark that currently consists of 8 distinct environments to assess LLM-as-Agent's reasoning and decision-making abilities.
arXiv Detail & Related papers (2023-08-07T16:08:11Z) - Scenic4RL: Programmatic Modeling and Generation of Reinforcement
Learning Environments [89.04823188871906]
Generation of diverse realistic scenarios is challenging for real-time strategy (RTS) environments.
Most of the existing simulators rely on randomly generating the environments.
We introduce the benefits of adopting an existing formal scenario specification language, SCENIC, to assist researchers.
arXiv Detail & Related papers (2021-06-18T21:49:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.