Related papers: Ask What Your Country Can Do For You: Towards a Public Red Teaming Model

Ask What Your Country Can Do For You: Towards a Public Red Teaming Model

URL: http://arxiv.org/abs/2510.20061v1
Date: Wed, 22 Oct 2025 22:24:21 GMT
Title: Ask What Your Country Can Do For You: Towards a Public Red Teaming Model
Authors: Wm. Matthew Kennedy, Cigdem Patlak, Jayraj Dave, Blake Chambers, Aayush Dhanotiya, Darshini Ramiah, Reva Schwartz, Jack Hagen, Akash Kundu, Mouni Pendharkar, Liam Baisley, Theodora Skeadas, Rumman Chowdhury,
Abstract summary: We propose a cooperative public AI red-teaming exercise.<n>First in-person public demonstrator exercise was held in conjunction with CAMLIS 2024.<n>We argue that this approach is both capable of delivering meaningful results and is also scalable to many AI developing jurisdictions.
Score: 1.4138385478350077
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: AI systems have the potential to produce both benefits and harms, but without rigorous and ongoing adversarial evaluation, AI actors will struggle to assess the breadth and magnitude of the AI risk surface. Researchers from the field of systems design have developed several effective sociotechnical AI evaluation and red teaming techniques targeting bias, hate speech, mis/disinformation, and other documented harm classes. However, as increasingly sophisticated AI systems are released into high-stakes sectors (such as education, healthcare, and intelligence-gathering), our current evaluation and monitoring methods are proving less and less capable of delivering effective oversight. In order to actually deliver responsible AI and to ensure AI's harms are fully understood and its security vulnerabilities mitigated, pioneering new approaches to close this "responsibility gap" are now more urgent than ever. In this paper, we propose one such approach, the cooperative public AI red-teaming exercise, and discuss early results of its prior pilot implementations. This approach is intertwined with CAMLIS itself: the first in-person public demonstrator exercise was held in conjunction with CAMLIS 2024. We review the operational design and results of this exercise, the prior National Institute of Standards and Technology (NIST)'s Assessing the Risks and Impacts of AI (ARIA) pilot exercise, and another similar exercise conducted with the Singapore Infocomm Media Development Authority (IMDA). Ultimately, we argue that this approach is both capable of delivering meaningful results and is also scalable to many AI developing jurisdictions.

Related papers

Improving Methodologies for Agentic Evaluations Across Domains: Leakage of Sensitive Information, Fraud and Cybersecurity Threats [17.766681829762256]
Agent testing remains nascent and is still a developing science.<n>As AI agents begin to be deployed globally, it is important that they handle different languages and cultures accurately and securely.<n>This is the third exercise, building on insights from two earlier joint testing exercises conducted by the Network.
arXiv Detail & Related papers (2026-01-22T06:00:00Z)
Never Compromise to Vulnerabilities: A Comprehensive Survey on AI Governance [211.5823259429128]
We propose a comprehensive framework integrating technical and societal dimensions, structured around three interconnected pillars: Intrinsic Security, Derivative Security, and Social Ethics.<n>We identify three core challenges: (1) the generalization gap, where defenses fail against evolving threats; (2) inadequate evaluation protocols that overlook real-world risks; and (3) fragmented regulations leading to inconsistent oversight.<n>Our framework offers actionable guidance for researchers, engineers, and policymakers to develop AI systems that are not only robust and secure but also ethically aligned and publicly trustworthy.
arXiv Detail & Related papers (2025-08-12T09:42:56Z)
Report on NSF Workshop on Science of Safe AI [75.96202715567088]
New advances in machine learning are leading to new opportunities to develop technology-based solutions to societal problems.<n>To fulfill the promise of AI, we must address how to develop AI-based systems that are accurate and performant but also safe and trustworthy.<n>This report is the result of the discussions in the working groups that addressed different aspects of safety at the workshop.
arXiv Detail & Related papers (2025-06-24T18:55:29Z)
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI [52.138044013005]
generative AI, particularly large language models (LLMs), become increasingly integrated into production applications. New attack surfaces and vulnerabilities emerge and put a focus on adversarial threats in natural language and multi-modal systems. Red-teaming has gained importance in proactively identifying weaknesses in these systems, while blue-teaming works to protect against such adversarial attacks. This work aims to bridge the gap between academic insights and practical security measures for the protection of generative AI systems.
arXiv Detail & Related papers (2024-09-23T10:18:10Z)
Mapping Technical Safety Research at AI Companies: A literature review and incentives analysis [0.0]
Report analyzes the technical research into safe AI development being conducted by three leading AI companies. Anthropic, Google DeepMind, and OpenAI. We defined safe AI development as developing AI systems that are unlikely to pose large-scale misuse or accident risks.
arXiv Detail & Related papers (2024-09-12T09:34:55Z)
Participatory Approaches in AI Development and Governance: A Principled Approach [9.271573427680087]
This paper forms the first part of a two-part series on participatory governance in AI. It advances the premise that a participatory approach is beneficial to building and using more responsible, safe, and human-centric AI systems.
arXiv Detail & Related papers (2024-06-03T09:49:42Z)
Particip-AI: A Democratic Surveying Framework for Anticipating Future AI Use Cases, Harms and Benefits [54.648819983899614]
General purpose AI seems to have lowered the barriers for the public to use AI and harness its power. We introduce PARTICIP-AI, a framework for laypeople to speculate and assess AI use cases and their impacts.
arXiv Detail & Related papers (2024-03-21T19:12:37Z)
A Red Teaming Framework for Securing AI in Maritime Autonomous Systems [0.0]
We propose one of the first red team frameworks for evaluating the AI security of maritime autonomous systems. This framework is a multi-part checklist, which can be tailored to different systems and requirements. We demonstrate this framework to be highly effective for a red team to use to uncover numerous vulnerabilities within a real-world maritime autonomous systems AI.
arXiv Detail & Related papers (2023-12-08T14:59:07Z)
Assessing AI Impact Assessments: A Classroom Study [14.768235460961876]
Artificial Intelligence Impact Assessments ("AIIAs"), a family of tools that provide structured processes to imagine the possible impacts of a proposed AI system, have become an increasingly popular proposal to govern AI systems. Recent efforts from government or private-sector organizations have proposed many diverse instantiations of AIIAs, which take a variety of forms ranging from open-ended questionnaires to graded score-cards. We conduct a classroom study at a large research-intensive university (R1) in an elective course focused on the societal and ethical implications of AI. We find preliminary evidence that impact assessments can influence participants' perceptions of the potential
arXiv Detail & Related papers (2023-11-19T01:00:59Z)
Managing extreme AI risks amid rapid progress [171.05448842016125]
We describe risks that include large-scale social harms, malicious uses, and irreversible loss of human control over autonomous AI systems. There is a lack of consensus about how exactly such risks arise, and how to manage them. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness, and barely address autonomous systems.
arXiv Detail & Related papers (2023-10-26T17:59:06Z)
Fairness in AI and Its Long-Term Implications on Society [68.8204255655161]
We take a closer look at AI fairness and analyze how lack of AI fairness can lead to deepening of biases over time. We discuss how biased models can lead to more negative real-world outcomes for certain groups. If the issues persist, they could be reinforced by interactions with other risks and have severe implications on society in the form of social unrest.
arXiv Detail & Related papers (2023-04-16T11:22:59Z)
FATE in AI: Towards Algorithmic Inclusivity and Accessibility [0.0]
To prevent algorithmic disparities, fairness, accountability, transparency, and ethics (FATE) in AI are being implemented. This study examines FATE-related desiderata, particularly transparency and ethics, in areas of the global South that are underserved by AI. To promote inclusivity, a community-led strategy is proposed to collect and curate representative data for responsible AI design.
arXiv Detail & Related papers (2023-01-03T15:08:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.