Related papers: From Firewalls to Frontiers: AI Red-Teaming is a Domain-Specific Evolution of Cyber Red-Teaming

From Firewalls to Frontiers: AI Red-Teaming is a Domain-Specific Evolution of Cyber Red-Teaming

URL: http://arxiv.org/abs/2509.11398v1
Date: Sun, 14 Sep 2025 19:21:58 GMT
Title: From Firewalls to Frontiers: AI Red-Teaming is a Domain-Specific Evolution of Cyber Red-Teaming
Authors: Anusha Sinha, Keltin Grimes, James Lucassen, Michael Feffer, Nathan VanHoudnos, Zhiwei Steven Wu, Hoda Heidari,
Abstract summary: A red team simulates adversary attacks to help defenders find effective strategies to defend their systems.<n>We argue that AI systems can be more effectively red-teamed if AI red-teaming is recognized as a domain-specific evolution of cyber red-teaming.
Score: 21.957207711016384
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: A red team simulates adversary attacks to help defenders find effective strategies to defend their systems in a real-world operational setting. As more enterprise systems adopt AI, red-teaming will need to evolve to address the unique vulnerabilities and risks posed by AI systems. We take the position that AI systems can be more effectively red-teamed if AI red-teaming is recognized as a domain-specific evolution of cyber red-teaming. Specifically, we argue that existing Cyber Red Teams who adopt this framing will be able to better evaluate systems with AI components by recognizing that AI poses new risks, has new failure modes to exploit, and often contains unpatchable bugs that re-prioritize disclosure and mitigation strategies. Similarly, adopting a cybersecurity framing will allow existing AI Red Teams to leverage a well-tested structure to emulate realistic adversaries, promote mutual accountability with formal rules of engagement, and provide a pattern to mature the tooling necessary for repeatable, scalable engagements. In these ways, the merging of AI and Cyber Red Teams will create a robust security ecosystem and best position the community to adapt to the rapidly changing threat landscape.

Related papers

Automatic LLM Red Teaming [18.044879441434432]
We propose a novel paradigm: training an AI to strategically break' another AI.<n>Our generative agent learns coherent, multi-turn attack strategies through a fine-grained, token-level harm reward.<n>This approach sets a new state-of-the-art, fundamentally reframing red teaming as a dynamic, trajectory-based process.
arXiv Detail & Related papers (2025-08-06T13:52:00Z)
When Autonomy Goes Rogue: Preparing for Risks of Multi-Agent Collusion in Social Systems [78.04679174291329]
We introduce a proof-of-concept to simulate the risks of malicious multi-agent systems (MAS)<n>We apply this framework to two high-risk fields: misinformation spread and e-commerce fraud.<n>Our findings show that decentralized systems are more effective at carrying out malicious actions than centralized ones.
arXiv Detail & Related papers (2025-07-19T15:17:30Z)
Red Teaming AI Red Teaming [9.942581294959107]
We argue that there exists a significant gap between red teaming's original intent and its narrow focus on discovering model-level flaws in the context of generative AI.<n>We propose a comprehensive framework operationalizing red teaming in AI systems at two levels: macro-level system red teaming and micro-level model red teaming.
arXiv Detail & Related papers (2025-07-07T23:23:40Z)
CoP: Agentic Red-teaming for Large Language Models using Composition of Principles [61.404771120828244]
This paper proposes an agentic workflow to automate and scale the red-teaming process of Large Language Models (LLMs)<n>Human users provide a set of red-teaming principles as instructions to an AI agent to automatically orchestrate effective red-teaming strategies and generate jailbreak prompts.<n>When tested against leading LLMs, CoP reveals unprecedented safety risks by finding novel jailbreak prompts and improving the best-known single-turn attack success rate by up to 19.0 times.
arXiv Detail & Related papers (2025-06-01T02:18:41Z)
A Red Teaming Roadmap Towards System-Level Safety [15.193906652918884]
Large Language Model (LLM) safeguards, which implement request refusals, have become a widely adopted mitigation strategy against misuse.<n>At the intersection of adversarial machine learning and AI safety, safeguard red teaming has effectively identified critical vulnerabilities in state-of-the-art refusal-trained LLMs.<n>We argue that testing against clear product safety specifications should take a higher priority than abstract social biases or ethical principles.
arXiv Detail & Related papers (2025-05-30T22:58:54Z)
Lessons From Red Teaming 100 Generative AI Products [1.5285633805077958]
In recent years, AI red teaming has emerged as a practice for probing the safety and security of generative AI systems.<n>We offer practical recommendations aimed at aligning red teaming efforts with real world risks.
arXiv Detail & Related papers (2025-01-13T11:36:33Z)
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI [52.138044013005]
generative AI, particularly large language models (LLMs), become increasingly integrated into production applications. New attack surfaces and vulnerabilities emerge and put a focus on adversarial threats in natural language and multi-modal systems. Red-teaming has gained importance in proactively identifying weaknesses in these systems, while blue-teaming works to protect against such adversarial attacks. This work aims to bridge the gap between academic insights and practical security measures for the protection of generative AI systems.
arXiv Detail & Related papers (2024-09-23T10:18:10Z)
Red-Teaming for Generative AI: Silver Bullet or Security Theater? [42.35800543892003]
We argue that while red-teaming may be a valuable big-tent idea for characterizing GenAI harm mitigations, industry may effectively apply red-teaming and other strategies behind closed doors to safeguard AI. To move toward a more robust toolbox of evaluations for generative AI, we synthesize our recommendations into a question bank meant to guide and scaffold future AI red-teaming practices.
arXiv Detail & Related papers (2024-01-29T05:46:14Z)
A Red Teaming Framework for Securing AI in Maritime Autonomous Systems [0.0]
We propose one of the first red team frameworks for evaluating the AI security of maritime autonomous systems. This framework is a multi-part checklist, which can be tailored to different systems and requirements. We demonstrate this framework to be highly effective for a red team to use to uncover numerous vulnerabilities within a real-world maritime autonomous systems AI.
arXiv Detail & Related papers (2023-12-08T14:59:07Z)
The Promise and Peril of Artificial Intelligence -- Violet Teaming Offers a Balanced Path Forward [56.16884466478886]
This paper reviews emerging issues with opaque and uncontrollable AI systems. It proposes an integrative framework called violet teaming to develop reliable and responsible AI. It emerged from AI safety research to manage risks proactively by design.
arXiv Detail & Related papers (2023-08-28T02:10:38Z)
Fixed Points in Cyber Space: Rethinking Optimal Evasion Attacks in the Age of AI-NIDS [70.60975663021952]
We study blackbox adversarial attacks on network classifiers. We argue that attacker-defender fixed points are themselves general-sum games with complex phase transitions. We show that a continual learning approach is required to study attacker-defender dynamics.
arXiv Detail & Related papers (2021-11-23T23:42:16Z)
The Feasibility and Inevitability of Stealth Attacks [63.14766152741211]
We study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence systems. In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself.
arXiv Detail & Related papers (2021-06-26T10:50:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.