Related papers: Red Teaming AI Red Teaming

Red Teaming AI Red Teaming

URL: http://arxiv.org/abs/2507.05538v1
Date: Mon, 07 Jul 2025 23:23:40 GMT
Title: Red Teaming AI Red Teaming
Authors: Subhabrata Majumdar, Brian Pendleton, Abhishek Gupta,
Abstract summary: We argue that there exists a significant gap between red teaming's original intent and its narrow focus on discovering model-level flaws in the context of generative AI.<n>We propose a comprehensive framework operationalizing red teaming in AI systems at two levels: macro-level system red teaming and micro-level model red teaming.
Score: 9.942581294959107
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Red teaming has evolved from its origins in military applications to become a widely adopted methodology in cybersecurity and AI. In this paper, we take a critical look at the practice of AI red teaming. We argue that despite its current popularity in AI governance, there exists a significant gap between red teaming's original intent as a critical thinking exercise and its narrow focus on discovering model-level flaws in the context of generative AI. Current AI red teaming efforts focus predominantly on individual model vulnerabilities while overlooking the broader sociotechnical systems and emergent behaviors that arise from complex interactions between models, users, and environments. To address this deficiency, we propose a comprehensive framework operationalizing red teaming in AI systems at two levels: macro-level system red teaming spanning the entire AI development lifecycle, and micro-level model red teaming. Drawing on cybersecurity experience and systems theory, we further propose a set of recommendations. In these, we emphasize that effective AI red teaming requires multifunctional teams that examine emergent risks, systemic vulnerabilities, and the interplay between technical and social factors.

Related papers

Automatic LLM Red Teaming [18.044879441434432]
We propose a novel paradigm: training an AI to strategically break' another AI.<n>Our generative agent learns coherent, multi-turn attack strategies through a fine-grained, token-level harm reward.<n>This approach sets a new state-of-the-art, fundamentally reframing red teaming as a dynamic, trajectory-based process.
arXiv Detail & Related papers (2025-08-06T13:52:00Z)
When Autonomy Goes Rogue: Preparing for Risks of Multi-Agent Collusion in Social Systems [78.04679174291329]
We introduce a proof-of-concept to simulate the risks of malicious multi-agent systems (MAS)<n>We apply this framework to two high-risk fields: misinformation spread and e-commerce fraud.<n>Our findings show that decentralized systems are more effective at carrying out malicious actions than centralized ones.
arXiv Detail & Related papers (2025-07-19T15:17:30Z)
CoP: Agentic Red-teaming for Large Language Models using Composition of Principles [61.404771120828244]
This paper proposes an agentic workflow to automate and scale the red-teaming process of Large Language Models (LLMs)<n>Human users provide a set of red-teaming principles as instructions to an AI agent to automatically orchestrate effective red-teaming strategies and generate jailbreak prompts.<n>When tested against leading LLMs, CoP reveals unprecedented safety risks by finding novel jailbreak prompts and improving the best-known single-turn attack success rate by up to 19.0 times.
arXiv Detail & Related papers (2025-06-01T02:18:41Z)
Effective Automation to Support the Human Infrastructure in AI Red Teaming [5.463538170874778]
We argue for a balanced approach that combines human expertise with automated tools to strengthen AI risk assessment.<n>We highlight key challenges in scaling automated red teaming, including considerations around worker proficiency, agency, and context-awareness.
arXiv Detail & Related papers (2025-03-28T03:36:15Z)
Lessons From Red Teaming 100 Generative AI Products [1.5285633805077958]
In recent years, AI red teaming has emerged as a practice for probing the safety and security of generative AI systems.<n>We offer practical recommendations aimed at aligning red teaming efforts with real world risks.
arXiv Detail & Related papers (2025-01-13T11:36:33Z)
AI red-teaming is a sociotechnical challenge: on values, labor, and harms [3.0001147629373195]
"Red-teaming" has quickly become the primary approach to test AI models.<n>We highlight the importance of understanding the values and assumptions behind red-teaming.
arXiv Detail & Related papers (2024-12-12T22:48:19Z)
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI [52.138044013005]
generative AI, particularly large language models (LLMs), become increasingly integrated into production applications. New attack surfaces and vulnerabilities emerge and put a focus on adversarial threats in natural language and multi-modal systems. Red-teaming has gained importance in proactively identifying weaknesses in these systems, while blue-teaming works to protect against such adversarial attacks. This work aims to bridge the gap between academic insights and practical security measures for the protection of generative AI systems.
arXiv Detail & Related papers (2024-09-23T10:18:10Z)
Red-Teaming for Generative AI: Silver Bullet or Security Theater? [42.35800543892003]
We argue that while red-teaming may be a valuable big-tent idea for characterizing GenAI harm mitigations, industry may effectively apply red-teaming and other strategies behind closed doors to safeguard AI. To move toward a more robust toolbox of evaluations for generative AI, we synthesize our recommendations into a question bank meant to guide and scaffold future AI red-teaming practices.
arXiv Detail & Related papers (2024-01-29T05:46:14Z)
The Promise and Peril of Artificial Intelligence -- Violet Teaming Offers a Balanced Path Forward [56.16884466478886]
This paper reviews emerging issues with opaque and uncontrollable AI systems. It proposes an integrative framework called violet teaming to develop reliable and responsible AI. It emerged from AI safety research to manage risks proactively by design.
arXiv Detail & Related papers (2023-08-28T02:10:38Z)
The Feasibility and Inevitability of Stealth Attacks [63.14766152741211]
We study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence systems. In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself.
arXiv Detail & Related papers (2021-06-26T10:50:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.