Securing External Deeper-than-black-box GPAI Evaluations
- URL: http://arxiv.org/abs/2503.07496v2
- Date: Thu, 13 Mar 2025 13:43:45 GMT
- Title: Securing External Deeper-than-black-box GPAI Evaluations
- Authors: Alejandro Tlaie, Jimmy Farrell,
- Abstract summary: This paper examines the critical challenges and potential solutions for conducting secure and effective external evaluations of general-purpose AI (GPAI) models.<n>With the exponential growth in size, capability, reach and accompanying risk, ensuring accountability, safety, and public trust requires frameworks that go beyond traditional black-box methods.
- Score: 49.1574468325115
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper examines the critical challenges and potential solutions for conducting secure and effective external evaluations of general-purpose AI (GPAI) models. With the exponential growth in size, capability, reach and accompanying risk of these models, ensuring accountability, safety, and public trust requires frameworks that go beyond traditional black-box methods. The discussion begins with an analysis of the need for deeper-than-black-box evaluations (Section I), emphasizing the importance of understanding model internals to uncover latent risks and ensure compliance with ethical and regulatory standards. Building on this foundation, Section II addresses the security considerations of remote evaluations, outlining the threat landscape, technical solutions, and safeguards necessary to protect both evaluators and proprietary model data. Finally, Section III synthesizes these insights into actionable recommendations and future directions, aiming to establish a robust, scalable, and transparent framework for external assessments in GPAI governance.
Related papers
- Adapting Probabilistic Risk Assessment for AI [0.0]
General-purpose artificial intelligence (AI) systems present an urgent risk management challenge.
Current methods often rely on selective testing and undocumented assumptions about risk priorities.
This paper introduces the probabilistic risk assessment (PRA) for AI framework.
arXiv Detail & Related papers (2025-04-25T17:59:14Z) - REVAL: A Comprehension Evaluation on Reliability and Values of Large Vision-Language Models [59.445672459851274]
REVAL is a comprehensive benchmark designed to evaluate the textbfREliability and textbfVALue of Large Vision-Language Models.
REVAL encompasses over 144K image-text Visual Question Answering (VQA) samples, structured into two primary sections: Reliability and Values.
We evaluate 26 models, including mainstream open-source LVLMs and prominent closed-source models like GPT-4o and Gemini-1.5-Pro.
arXiv Detail & Related papers (2025-03-20T07:54:35Z) - AI Companies Should Report Pre- and Post-Mitigation Safety Evaluations [5.984437476321095]
frontier AI companies should report both pre- and post-mitigation safety evaluations.
evaluating models at both stages provides policymakers with essential evidence to regulate deployment, access, and safety standards.
arXiv Detail & Related papers (2025-03-17T17:56:43Z) - Decoding the Black Box: Integrating Moral Imagination with Technical AI Governance [0.0]
We develop a comprehensive framework designed to regulate AI technologies deployed in high-stakes domains such as defense, finance, healthcare, and education.<n>Our approach combines rigorous technical analysis, quantitative risk assessment, and normative evaluation to expose systemic vulnerabilities.
arXiv Detail & Related papers (2025-03-09T03:11:32Z) - Position: Ensuring mutual privacy is necessary for effective external evaluation of proprietary AI systems [17.53028680356076]
External evaluation of AI systems is increasingly recognised as a crucial approach for understanding their potential risks.<n>Facilitating external evaluation in practice faces significant challenges in balancing evaluators' need for system access with AI developers' privacy and security concerns.
arXiv Detail & Related papers (2025-03-03T12:24:59Z) - On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective [314.7991906491166]
Generative Foundation Models (GenFMs) have emerged as transformative tools.<n>Their widespread adoption raises critical concerns regarding trustworthiness across dimensions.<n>This paper presents a comprehensive framework to address these challenges through three key contributions.
arXiv Detail & Related papers (2025-02-20T06:20:36Z) - AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons [62.374792825813394]
This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability.
The benchmark evaluates an AI system's resistance to prompts designed to elicit dangerous, illegal, or undesirable behavior in 12 hazard categories.
arXiv Detail & Related papers (2025-02-19T05:58:52Z) - Towards Trustworthy Retrieval Augmented Generation for Large Language Models: A Survey [92.36487127683053]
Retrieval-Augmented Generation (RAG) is an advanced technique designed to address the challenges of Artificial Intelligence-Generated Content (AIGC)<n>RAG provides reliable and up-to-date external knowledge, reduces hallucinations, and ensures relevant context across a wide range of tasks.<n>Despite RAG's success and potential, recent studies have shown that the RAG paradigm also introduces new risks, including privacy concerns, adversarial attacks, and accountability issues.
arXiv Detail & Related papers (2025-02-08T06:50:47Z) - Holistic Safety and Responsibility Evaluations of Advanced AI Models [18.34510620901674]
Safety and responsibility evaluations of advanced AI models are a critical but developing field of research and practice.
In the development of Google DeepMind's advanced AI models, we innovated on and applied a broad set of approaches to safety evaluation.
arXiv Detail & Related papers (2024-04-22T10:26:49Z) - Sociotechnical Safety Evaluation of Generative AI Systems [13.546708226350963]
Generative AI systems produce a range of risks.
To ensure the safety of generative AI systems, these risks must be evaluated.
We propose a three-layered framework that takes a structured, sociotechnical approach to evaluating these risks.
arXiv Detail & Related papers (2023-10-18T14:13:58Z) - Model evaluation for extreme risks [46.53170857607407]
Further progress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills.
We explain why model evaluation is critical for addressing extreme risks.
arXiv Detail & Related papers (2023-05-24T16:38:43Z) - Towards Safer Generative Language Models: A Survey on Safety Risks,
Evaluations, and Improvements [76.80453043969209]
This survey presents a framework for safety research pertaining to large models.
We begin by introducing safety issues of wide concern, then delve into safety evaluation methods for large models.
We explore the strategies for enhancing large model safety from training to deployment.
arXiv Detail & Related papers (2023-02-18T09:32:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.