RedTeamLLM: an Agentic AI framework for offensive security
- URL: http://arxiv.org/abs/2505.06913v1
- Date: Sun, 11 May 2025 09:19:10 GMT
- Title: RedTeamLLM: an Agentic AI framework for offensive security
- Authors: Brian Challita, Pierre Parrend,
- Abstract summary: We propose and evaluate RedTeamLLM, an integrated architecture with a comprehensive security model for automatization of pentest tasks.<n>RedTeamLLM follows three key steps: summarizing, reasoning and act, which embed its operational capacity.<n> Evaluation is performed through the automated resolution of a range of entry-level, but not trivial, CTF challenges.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: From automated intrusion testing to discovery of zero-day attacks before software launch, agentic AI calls for great promises in security engineering. This strong capability is bound with a similar threat: the security and research community must build up its models before the approach is leveraged by malicious actors for cybercrime. We therefore propose and evaluate RedTeamLLM, an integrated architecture with a comprehensive security model for automatization of pentest tasks. RedTeamLLM follows three key steps: summarizing, reasoning and act, which embed its operational capacity. This novel framework addresses four open challenges: plan correction, memory management, context window constraint, and generality vs. specialization. Evaluation is performed through the automated resolution of a range of entry-level, but not trivial, CTF challenges. The contribution of the reasoning capability of our agentic AI framework is specifically evaluated.
Related papers
- OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety [58.201189860217724]
We introduce OpenAgentSafety, a comprehensive framework for evaluating agent behavior across eight critical risk categories.<n>Unlike prior work, our framework evaluates agents that interact with real tools, including web browsers, code execution environments, file systems, bash shells, and messaging platforms.<n>It combines rule-based analysis with LLM-as-judge assessments to detect both overt and subtle unsafe behaviors.
arXiv Detail & Related papers (2025-07-08T16:18:54Z) - A Systematization of Security Vulnerabilities in Computer Use Agents [1.3560089220432787]
We conduct a systematic threat analysis and testing of real-world CUAs under adversarial conditions.<n>We identify seven classes of risks unique to the CUA paradigm, and analyze three concrete exploit scenarios in depth.<n>These case studies reveal deeper architectural flaws across current CUA implementations.
arXiv Detail & Related papers (2025-07-07T19:50:21Z) - TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems [2.462408812529728]
This review presents a structured analysis of textbfTrust, Risk, and Security Management (TRiSM) in the context of LLM-based Agentic Multi-Agent Systems (AMAS)<n>We begin by examining the conceptual foundations of Agentic AI and highlight its architectural distinctions from traditional AI agents.<n>We then adapt and extend the AI TRiSM framework for Agentic AI, structured around four key pillars: Explainability, ModelOps, Security, Privacy and Governance.
arXiv Detail & Related papers (2025-06-04T16:26:11Z) - ATAG: AI-Agent Application Threat Assessment with Attack Graphs [23.757154032523093]
This paper introduces AI-agent application Threat assessment with Attack Graphs (ATAG)<n>ATAG is a novel framework designed to systematically analyze the security risks associated with AI-agent applications.<n>It facilitates proactive identification and mitigation of AI-agent threats in multi-agent applications.
arXiv Detail & Related papers (2025-06-03T13:25:40Z) - SafeAgent: Safeguarding LLM Agents via an Automated Risk Simulator [77.86600052899156]
Large Language Model (LLM)-based agents are increasingly deployed in real-world applications.<n>We propose AutoSafe, the first framework that systematically enhances agent safety through fully automated synthetic data generation.<n>We show that AutoSafe boosts safety scores by 45% on average and achieves a 28.91% improvement on real-world tasks.
arXiv Detail & Related papers (2025-05-23T10:56:06Z) - A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron? [30.063392019347887]
We present a systematization of knowledge on the safety and security threats of emphComputer-Using Agents.<n> CUAs are capable of autonomously performing tasks such as navigating desktop applications, web pages, and mobile apps.
arXiv Detail & Related papers (2025-05-16T06:56:42Z) - AgentXploit: End-to-End Redteaming of Black-Box AI Agents [54.29555239363013]
We propose a generic black-box fuzzing framework, AgentXploit, to automatically discover and exploit indirect prompt injection vulnerabilities.<n>We evaluate AgentXploit on two public benchmarks, AgentDojo and VWA-adv, where it achieves 71% and 70% success rates against agents based on o3-mini and GPT-4o.<n>We apply our attacks in real-world environments, successfully misleading agents to navigate to arbitrary URLs, including malicious sites.
arXiv Detail & Related papers (2025-05-09T07:40:17Z) - Securing Agentic AI: A Comprehensive Threat Model and Mitigation Framework for Generative AI Agents [0.0]
This paper introduces a comprehensive threat model tailored specifically for GenAI agents.<n>Research work identifies 9 primary threats and organizes them across five key domains.
arXiv Detail & Related papers (2025-04-28T16:29:24Z) - DoomArena: A framework for Testing AI Agents Against Evolving Security Threats [84.94654617852322]
We present DoomArena, a security evaluation framework for AI agents.<n>It is a plug-in framework and integrates easily into realistic agentic frameworks.<n>It is modular and decouples the development of attacks from details of the environment in which the agent is deployed.
arXiv Detail & Related papers (2025-04-18T20:36:10Z) - AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration [40.350632196772466]
This paper introduces AutoRedTeamer, a novel framework for fully automated, end-to-end red teaming against large language models (LLMs)<n>AutoRedTeamer combines a multi-agent architecture with a memory-guided attack selection mechanism to enable continuous discovery and integration of new attack vectors.<n>We demonstrate AutoRedTeamer's effectiveness across diverse evaluation settings, achieving 20% higher attack success rates on HarmBench against Llama-3.1-70B.
arXiv Detail & Related papers (2025-03-20T00:13:04Z) - A Framework for Evaluating Emerging Cyberattack Capabilities of AI [11.595840449117052]
This work introduces a novel evaluation framework that addresses limitations by: (1) examining the end-to-end attack chain, (2) identifying gaps in AI threat evaluation, and (3) helping defenders prioritize targeted mitigations.<n>We analyzed over 12,000 real-world instances of AI involvement in cyber incidents, catalogued by Google's Threat Intelligence Group, to curate seven representative attack chain archetypes.<n>We report on AI's potential to amplify offensive capabilities across specific attack stages, and offer recommendations for prioritizing defenses.
arXiv Detail & Related papers (2025-03-14T23:05:02Z) - Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI [52.138044013005]
generative AI, particularly large language models (LLMs), become increasingly integrated into production applications.
New attack surfaces and vulnerabilities emerge and put a focus on adversarial threats in natural language and multi-modal systems.
Red-teaming has gained importance in proactively identifying weaknesses in these systems, while blue-teaming works to protect against such adversarial attacks.
This work aims to bridge the gap between academic insights and practical security measures for the protection of generative AI systems.
arXiv Detail & Related papers (2024-09-23T10:18:10Z) - Artificial Intelligence as the New Hacker: Developing Agents for Offensive Security [0.0]
This paper explores the integration of Artificial Intelligence (AI) into offensive cybersecurity.
It develops an autonomous AI agent, ReaperAI, designed to simulate and execute cyberattacks.
ReaperAI demonstrates the potential to identify, exploit, and analyze security vulnerabilities autonomously.
arXiv Detail & Related papers (2024-05-09T18:15:12Z) - ASSERT: Automated Safety Scenario Red Teaming for Evaluating the
Robustness of Large Language Models [65.79770974145983]
ASSERT, Automated Safety Scenario Red Teaming, consists of three methods -- semantically aligned augmentation, target bootstrapping, and adversarial knowledge injection.
We partition our prompts into four safety domains for a fine-grained analysis of how the domain affects model performance.
We find statistically significant performance differences of up to 11% in absolute classification accuracy among semantically related scenarios and error rates of up to 19% absolute error in zero-shot adversarial settings.
arXiv Detail & Related papers (2023-10-14T17:10:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.