BlueCodeAgent: A Blue Teaming Agent Enabled by Automated Red Teaming for CodeGen AI
- URL: http://arxiv.org/abs/2510.18131v1
- Date: Mon, 20 Oct 2025 22:00:10 GMT
- Title: BlueCodeAgent: A Blue Teaming Agent Enabled by Automated Red Teaming for CodeGen AI
- Authors: Chengquan Guo, Yuzhou Nie, Chulin Xie, Zinan Lin, Wenbo Guo, Bo Li,
- Abstract summary: We propose BlueCodeAgent, an end-to-end blue teaming agent enabled by automated red teaming.<n>Our framework integrates both sides: red teaming generates diverse risky instances, while the blue teaming agent leverages these to detect previously seen and unseen risk scenarios.<n>BlueCodeAgent achieves an average 12.7% F1 score improvement across four datasets in three tasks.
- Score: 19.047693413887107
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As large language models (LLMs) are increasingly used for code generation, concerns over the security risks have grown substantially. Early research has primarily focused on red teaming, which aims to uncover and evaluate vulnerabilities and risks of CodeGen models. However, progress on the blue teaming side remains limited, as developing defense requires effective semantic understanding to differentiate the unsafe from the safe. To fill in this gap, we propose BlueCodeAgent, an end-to-end blue teaming agent enabled by automated red teaming. Our framework integrates both sides: red teaming generates diverse risky instances, while the blue teaming agent leverages these to detect previously seen and unseen risk scenarios through constitution and code analysis with agentic integration for multi-level defense. Our evaluation across three representative code-related tasks--bias instruction detection, malicious instruction detection, and vulnerable code detection--shows that BlueCodeAgent achieves significant gains over the base models and safety prompt-based defenses. In particular, for vulnerable code detection tasks, BlueCodeAgent integrates dynamic analysis to effectively reduce false positives, a challenging problem as base models tend to be over-conservative, misclassifying safe code as unsafe. Overall, BlueCodeAgent achieves an average 12.7\% F1 score improvement across four datasets in three tasks, attributed to its ability to summarize actionable constitutions that enhance context-aware risk detection. We demonstrate that the red teaming benefits the blue teaming by continuously identifying new vulnerabilities to enhance defense performance.
Related papers
- Co-RedTeam: Orchestrated Security Discovery and Exploitation with LLM Agents [57.49020237126194]
Large language models (LLMs) have shown promise in assisting cybersecurity tasks, yet existing approaches struggle with automatic vulnerability discovery and exploitation.<n>We propose Co-RedTeam, a security-aware multi-agent framework designed to mirror real-world red-teaming.<n>Co-RedTeam decomposes vulnerability analysis into coordinated discovery and exploitation stages, enabling agents to plan, execute, validate, and refine actions.
arXiv Detail & Related papers (2026-02-02T14:38:45Z) - RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents [70.24175620901538]
Code agents have gained widespread adoption due to their strong code generation capabilities and integration with code interpreters.<n>Current static safety benchmarks and red-teaming tools are inadequate for identifying emerging real-world risky scenarios.<n>We propose RedCodeAgent, the first automated red-teaming agent designed to systematically uncover vulnerabilities in diverse code agents.
arXiv Detail & Related papers (2025-10-02T22:59:06Z) - OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety [58.201189860217724]
We introduce OpenAgentSafety, a comprehensive framework for evaluating agent behavior across eight critical risk categories.<n>Unlike prior work, our framework evaluates agents that interact with real tools, including web browsers, code execution environments, file systems, bash shells, and messaging platforms.<n>It combines rule-based analysis with LLM-as-judge assessments to detect both overt and subtle unsafe behaviors.
arXiv Detail & Related papers (2025-07-08T16:18:54Z) - CoP: Agentic Red-teaming for Large Language Models using Composition of Principles [61.404771120828244]
This paper proposes an agentic workflow to automate and scale the red-teaming process of Large Language Models (LLMs)<n>Human users provide a set of red-teaming principles as instructions to an AI agent to automatically orchestrate effective red-teaming strategies and generate jailbreak prompts.<n>When tested against leading LLMs, CoP reveals unprecedented safety risks by finding novel jailbreak prompts and improving the best-known single-turn attack success rate by up to 19.0 times.
arXiv Detail & Related papers (2025-06-01T02:18:41Z) - AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration [40.350632196772466]
This paper introduces AutoRedTeamer, a novel framework for fully automated, end-to-end red teaming against large language models (LLMs)<n>AutoRedTeamer combines a multi-agent architecture with a memory-guided attack selection mechanism to enable continuous discovery and integration of new attack vectors.<n>We demonstrate AutoRedTeamer's effectiveness across diverse evaluation settings, achieving 20% higher attack success rates on HarmBench against Llama-3.1-70B.
arXiv Detail & Related papers (2025-03-20T00:13:04Z) - RedCode: Risky Code Execution and Generation Benchmark for Code Agents [50.81206098588923]
RedCode is a benchmark for risky code execution and generation.
RedCode-Exec provides challenging prompts that could lead to risky code execution.
RedCode-Gen provides 160 prompts with function signatures and docstrings as input to assess whether code agents will follow instructions.
arXiv Detail & Related papers (2024-11-12T13:30:06Z) - AdvAgent: Controllable Blackbox Red-teaming on Web Agents [22.682464365220916]
AdvAgent is a black-box red-teaming framework for attacking web agents.<n>It employs a reinforcement learning-based pipeline to train an adversarial prompter model.<n>With careful attack design, these prompts effectively exploit agent weaknesses while maintaining stealthiness and controllability.
arXiv Detail & Related papers (2024-10-22T20:18:26Z) - AutoSafeCoder: A Multi-Agent Framework for Securing LLM Code Generation through Static Analysis and Fuzz Testing [6.334110674473677]
Existing approaches often rely on a single agent for code generation, which struggles to produce secure, vulnerability-free code.
We propose AutoSafeCoder, a multi-agent framework that leverages LLM-driven agents for code generation, vulnerability analysis, and security enhancement through continuous collaboration.
Our contribution focuses on ensuring the safety of multi-agent code generation by integrating dynamic and static testing in an iterative process during code generation.
arXiv Detail & Related papers (2024-09-16T21:15:56Z) - Dissecting Adversarial Robustness of Multimodal LM Agents [70.2077308846307]
We manually create 200 targeted adversarial tasks and evaluation scripts in a realistic threat model on top of VisualWebArena.<n>We find that we can successfully break latest agents that use black-box frontier LMs, including those that perform reflection and tree search.<n>We also use ARE to rigorously evaluate how the robustness changes as new components are added.
arXiv Detail & Related papers (2024-06-18T17:32:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.