Related papers: Out of the Cage: How Stochastic Parrots Win in Cyber Security Environments

Out of the Cage: How Stochastic Parrots Win in Cyber Security Environments

URL: http://arxiv.org/abs/2308.12086v2
Date: Mon, 28 Aug 2023 09:42:59 GMT
Title: Out of the Cage: How Stochastic Parrots Win in Cyber Security Environments
Authors: Maria Rigaki, Ond\v{r}ej Luk\'a\v{s}, Carlos A. Catania, Sebastian Garcia
Abstract summary: Large Language Models (LLMs) have gained widespread popularity across diverse domains. This paper introduces a novel application of pre-trained LLMs as agents within cybersecurity network environments. We present an approach wherein pre-trained LLMs are leveraged as attacking agents in two reinforcement learning environments.
Score: 0.5735035463793008
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Large Language Models (LLMs) have gained widespread popularity across diverse domains involving text generation, summarization, and various natural language processing tasks. Despite their inherent limitations, LLM-based designs have shown promising capabilities in planning and navigating open-world scenarios. This paper introduces a novel application of pre-trained LLMs as agents within cybersecurity network environments, focusing on their utility for sequential decision-making processes. We present an approach wherein pre-trained LLMs are leveraged as attacking agents in two reinforcement learning environments. Our proposed agents demonstrate similar or better performance against state-of-the-art agents trained for thousands of episodes in most scenarios and configurations. In addition, the best LLM agents perform similarly to human testers of the environment without any additional training process. This design highlights the potential of LLMs to efficiently address complex decision-making tasks within cybersecurity. Furthermore, we introduce a new network security environment named NetSecGame. The environment is designed to eventually support complex multi-agent scenarios within the network security domain. The proposed environment mimics real network attacks and is designed to be highly modular and adaptable for various scenarios.

Related papers

AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents [54.29555239363013]
We propose a generic black-box fuzzing framework, AgentVigil, to automatically discover and exploit indirect prompt injection vulnerabilities.<n>We evaluate AgentVigil on two public benchmarks, AgentDojo and VWA-adv, where it achieves 71% and 70% success rates against agents based on o3-mini and GPT-4o.<n>We apply our attacks in real-world environments, successfully misleading agents to navigate to arbitrary URLs, including malicious sites.
arXiv Detail & Related papers (2025-05-09T07:40:17Z)
Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks. However, they still struggle with problems requiring multi-step decision-making and environmental feedback. We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z)
Multi-Agent Collaboration in Incident Response with Large Language Models [0.0]
Incident response (IR) is a critical aspect of cybersecurity, requiring rapid decision-making and coordinated efforts to address cyberattacks effectively. Leveraging large language models (LLMs) as intelligent agents offers a novel approach to enhancing collaboration and efficiency in IR scenarios. This paper explores the application of LLM-based multi-agent collaboration using the Backdoors & Breaches framework.
arXiv Detail & Related papers (2024-12-01T03:12:26Z)
MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation [52.739500459903724]
Large Language Models (LLMs) have demonstrated remarkable planning abilities across various domains, including robotics manipulation and navigation. We propose a novel multi-agent LLM framework that distributes high-level planning and low-level control code generation across specialized LLM agents. We evaluate our approach on nine RLBench tasks, including long-horizon tasks, and demonstrate its ability to solve robotics manipulation in a zero-shot setting.
arXiv Detail & Related papers (2024-11-26T17:53:44Z)
DynaSaur: Large Language Agents Beyond Predefined Actions [108.75187263724838]
Existing LLM agent systems typically select actions from a fixed and predefined set at every step. We propose an LLM agent framework that enables the dynamic creation and composition of actions in an online manner. Our experiments on the GAIA benchmark demonstrate that this framework offers significantly greater flexibility and outperforms previous methods.
arXiv Detail & Related papers (2024-11-04T02:08:59Z)
Entity-based Reinforcement Learning for Autonomous Cyber Defence [0.22499166814992438]
Key challenge for autonomous cyber defence is ensuring a defensive agent's ability to generalise across diverse network topologies and configurations. Standard approaches to deep reinforcement learning expect fixed-size observation and action spaces. In autonomous cyber defence, this makes it hard to develop agents that generalise to environments with network topologies different from those trained on.
arXiv Detail & Related papers (2024-10-23T08:04:12Z)
Hierarchical Multi-agent Reinforcement Learning for Cyber Network Defense [7.967738380932909]
We propose a hierarchical Proximal Policy Optimization (PPO) architecture that decomposes the cyber defense task into specific sub-tasks like network investigation and host recovery. Our approach involves training sub-policies for each sub-task using PPO enhanced with domain expertise. These sub-policies are then leveraged by a master defense policy that coordinates their selection to solve complex network defense tasks.
arXiv Detail & Related papers (2024-10-22T18:35:05Z)
Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments [0.5735035463793008]
Large Language Models (LLMs) have shown remarkable potential across various domains, including cybersecurity. We present Hackphyr, a locally fine-tuned LLM to be used as a red-team agent within network security environments.
arXiv Detail & Related papers (2024-09-17T15:28:25Z)
Large Language Models for Base Station Siting: Intelligent Deployment based on Prompt or Agent [62.16747639440893]
Large language models (LLMs) and their associated technologies advance, particularly in the realms of prompt engineering and agent engineering. This approach entails the strategic use of well-crafted prompts to infuse human experience and knowledge into these sophisticated LLMs. This integration represents the future paradigm of artificial intelligence (AI) as a service and AI for more ease.
arXiv Detail & Related papers (2024-08-07T08:43:32Z)
Compromising Embodied Agents with Contextual Backdoor Attacks [69.71630408822767]
Large language models (LLMs) have transformed the development of embodied intelligence. This paper uncovers a significant backdoor security threat within this process. By poisoning just a few contextual demonstrations, attackers can covertly compromise the contextual environment of a black-box LLM.
arXiv Detail & Related papers (2024-08-06T01:20:12Z)
Large Language Models for Cyber Security: A Systematic Literature Review [14.924782327303765]
We conduct a comprehensive review of the literature on the application of Large Language Models in cybersecurity (LLM4Security) We observe that LLMs are being applied to a wide range of cybersecurity tasks, including vulnerability detection, malware analysis, network intrusion detection, and phishing detection. Third, we identify several promising techniques for adapting LLMs to specific cybersecurity domains, such as fine-tuning, transfer learning, and domain-specific pre-training.
arXiv Detail & Related papers (2024-05-08T02:09:17Z)
LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments [35.926581910260076]
We introduce LLMArena, a framework for evaluating the capabilities of large language models in multi-agent dynamic environments. LLArena employs Trueskill scoring to assess crucial abilities in LLM agents, including spatial reasoning, strategic planning, numerical reasoning, risk assessment, communication, opponent modeling, and team collaboration. We conduct an extensive experiment and human evaluation among different sizes and types of LLMs, showing that LLMs still have a significant journey ahead in their development towards becoming fully autonomous agents.
arXiv Detail & Related papers (2024-02-26T11:31:48Z)
AgentBench: Evaluating LLMs as Agents [88.45506148281379]
Large Language Models (LLMs) are becoming increasingly smart and autonomous, targeting real-world pragmatic missions beyond traditional NLP tasks. We present AgentBench, a benchmark that currently consists of 8 distinct environments to assess LLM-as-Agent's reasoning and decision-making abilities.
arXiv Detail & Related papers (2023-08-07T16:08:11Z)
Scenic4RL: Programmatic Modeling and Generation of Reinforcement Learning Environments [89.04823188871906]
Generation of diverse realistic scenarios is challenging for real-time strategy (RTS) environments. Most of the existing simulators rely on randomly generating the environments. We introduce the benefits of adopting an existing formal scenario specification language, SCENIC, to assist researchers.
arXiv Detail & Related papers (2021-06-18T21:49:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.