Related papers: D-CIPHER: Dynamic Collaborative Intelligent Agents with Planning and Heterogeneous Execution for Enhanced Reasoning in Offensive Security

D-CIPHER: Dynamic Collaborative Intelligent Agents with Planning and Heterogeneous Execution for Enhanced Reasoning in Offensive Security

URL: http://arxiv.org/abs/2502.10931v1
Date: Sat, 15 Feb 2025 23:43:18 GMT
Title: D-CIPHER: Dynamic Collaborative Intelligent Agents with Planning and Heterogeneous Execution for Enhanced Reasoning in Offensive Security
Authors: Meet Udeshi, Minghao Shao, Haoran Xi, Nanda Rani, Kimberly Milner, Venkata Sai Charan Putrevu, Brendan Dolan-Gavitt, Sandeep Kumar Shukla, Prashanth Krishnamurthy, Farshad Khorrami, Ramesh Karri, Muhammad Shafique,
Abstract summary: Large Language Models (LLMs) have been used in cybersecurity in many ways.<n>Capture the Flag (CTF) challenges serve as benchmarks for assessing the automated task-planning abilities of LLM agents.<n>We introduce the D-CIPHER multi-agent LLM framework for collaborative CTF challenge solving.
Score: 22.86304661035188
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have been used in cybersecurity in many ways, including their recent use as intelligent agent systems for autonomous security analysis. Capture the Flag (CTF) challenges serve as benchmarks for assessing the automated task-planning abilities of LLM agents across various cybersecurity skill sets. Early attempts to apply LLMs for solving CTF challenges relied on single-agent systems, where feedback was restricted to a single reasoning-action loop. This approach proved inadequate for handling complex CTF tasks. Drawing inspiration from real-world CTF competitions, where teams of experts collaborate, we introduce the D-CIPHER multi-agent LLM framework for collaborative CTF challenge solving. D-CIPHER integrates agents with distinct roles, enabling dynamic feedback loops to enhance reasoning on CTF challenges. It introduces the Planner-Executor agent system, consisting of a Planner agent for overall problem-solving along with multiple heterogeneous Executor agents for individual tasks, facilitating efficient allocation of responsibilities among the LLMs. Additionally, D-CIPHER incorporates an Auto-prompter agent, which improves problem-solving by exploring the challenge environment and generating a highly relevant initial prompt. We evaluate D-CIPHER on CTF benchmarks using multiple LLM models and conduct comprehensive studies to highlight the impact of our enhancements. Our results demonstrate that the multi-agent D-CIPHER system achieves a significant improvement in challenges solved, setting a state-of-the-art performance on three benchmarks: 22.0% on NYU CTF Bench, 22.5% on Cybench, and 44.0% on HackTheBox. D-CIPHER is available at https://github.com/NYU-LLM-CTF/nyuctf_agents as the nyuctf_multiagent package.

Related papers

Why Do Multi-Agent LLM Systems Fail? [91.39266556855513]
We present MAST (Multi-Agent System Failure taxonomy), the first empirically grounded taxonomy designed to understand MAS failures. We analyze seven popular MAS frameworks across over 200 tasks, involving six expert human annotators. We identify 14 unique failure modes, organized into 3 overarching categories, (i) specification issues, (ii) inter-agent misalignment, and (iii) task verification.
arXiv Detail & Related papers (2025-03-17T19:04:38Z)
Privacy-Enhancing Paradigms within Federated Multi-Agent Systems [47.76990892943637]
LLM-based Multi-Agent Systems (MAS) have proven highly effective in solving complex problems by integrating multiple agents, each performing different roles. In this paper, we introduce the concept of Federated MAS, highlighting the fundamental differences between Federated MAS and traditional FL. We then identify key challenges in developing Federated MAS, including: 1) heterogeneous privacy protocols among agents, 2) structural differences in multi-party conversations, and 3) dynamic conversational network structures. To address these challenges, we propose Embedded Privacy-Enhancing Agents (EPEAgent), an innovative solution that integrates seamlessly into the Retrieval-Augmented Generation phase and the
arXiv Detail & Related papers (2025-03-11T08:38:45Z)
Agent-Oriented Planning in Multi-Agent Systems [54.429028104022066]
We propose AOP, a novel framework for agent-oriented planning in multi-agent systems. In this study, we identify three critical design principles of agent-oriented planning, including solvability, completeness, and non-redundancy. Extensive experiments demonstrate the advancement of AOP in solving real-world problems compared to both single-agent systems and existing planning strategies for multi-agent systems.
arXiv Detail & Related papers (2024-10-03T04:07:51Z)
Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities [46.34031902647788]
We present EnIGMA, an LM agent for autonomously solving Capture The Flag (CTF) challenges.<n>We introduce new tools and interfaces to improve the agent's ability to find and exploit security vulnerabilities.<n> Empirical analysis on 390 CTF challenges demonstrate that these new tools and interfaces substantially improve our agent's performance.
arXiv Detail & Related papers (2024-09-24T15:06:01Z)
Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round LLM Generation [49.27250832754313]
We present AgentCOT, a llm-based autonomous agent framework. At each step, AgentCOT selects an action and executes it to yield an intermediate result with supporting evidence. We introduce two new strategies to enhance the performance of AgentCOT.
arXiv Detail & Related papers (2024-09-19T02:20:06Z)
On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents [58.79302663733703]
Large language model-based multi-agent systems have shown great abilities across various tasks due to the collaboration of expert agents.<n>However, the impact of clumsy or even malicious agents, on the overall performance of the system remains underexplored.<n>This paper investigates what is the resilience of various system structures under faulty agents.
arXiv Detail & Related papers (2024-08-02T03:25:20Z)
EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms [55.77492625524141]
EvoAgent is a generic method to automatically extend specialized agents to multi-agent systems. We show that EvoAgent can significantly enhance the task-solving capability of LLM-based agents.
arXiv Detail & Related papers (2024-06-20T11:49:23Z)
Adaptive In-conversation Team Building for Language Model Agents [33.03550687362213]
Leveraging multiple large language model (LLM) agents has shown to be a promising approach for tackling complex tasks. Our new adaptive team-building paradigm offers a flexible solution, realized through a novel agent design named Captain Agent. A comprehensive evaluation across six real-world scenarios demonstrates that Captain Agent significantly outperforms existing multi-agent methods.
arXiv Detail & Related papers (2024-05-29T18:08:37Z)
Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning [14.635361844362794]
Smurfs' is a cutting-edge multi-agent framework designed to revolutionize the application of large language models. Smurfs can enhance the model's ability to solve complex tasks at no additional cost.
arXiv Detail & Related papers (2024-05-09T17:49:04Z)
Learning to Use Tools via Cooperative and Interactive Agents [58.77710337157665]
Tool learning empowers large language models (LLMs) as agents to use external tools and extend their utility. We propose ConAgents, a Cooperative and interactive Agents framework, which coordinates three specialized agents for tool selection, tool execution, and action calibration separately. Our experiments on three datasets show that the LLMs, when equipped with ConAgents, outperform baselines with substantial improvement.
arXiv Detail & Related papers (2024-03-05T15:08:16Z)
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration [55.35849138235116]
We propose automatically selecting a team of agents from candidates to collaborate in a dynamic communication structure toward different tasks and domains. Specifically, we build a framework named Dynamic LLM-Powered Agent Network ($textDyLAN$) for LLM-powered agent collaboration. We demonstrate that DyLAN outperforms strong baselines in code generation, decision-making, general reasoning, and arithmetic reasoning tasks with moderate computational cost.
arXiv Detail & Related papers (2023-10-03T16:05:48Z)
Learning to Solve Complex Tasks by Talking to Agents [39.08818632689814]
Humans often solve complex problems by interacting with existing agents, such as AI assistants, that can solve simpler sub-tasks. Common NLP benchmarks aim for the development of self-sufficient models for every task. We propose a new benchmark called CommaQA that contains three kinds of complex reasoning tasks designed to be solved by talking'' to four agents with different capabilities.
arXiv Detail & Related papers (2021-10-16T10:37:34Z)
FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC) It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.