Toward Third-Party Assurance of AI Systems: Design Requirements, Prototype, and Early Testing
- URL: http://arxiv.org/abs/2601.22424v1
- Date: Fri, 30 Jan 2026 00:37:12 GMT
- Title: Toward Third-Party Assurance of AI Systems: Design Requirements, Prototype, and Early Testing
- Authors: Rachel M. Kim, Blaine Kuehnert, Alice Lai, Kenneth Holstein, Hoda Heidari, Rayid Ghani,
- Abstract summary: We introduce a third-party AI assurance framework that addresses gaps in AI evaluation.<n>We focus on third-party assurance to prevent conflict of interest and ensure credibility and accountability of the process.<n>Our findings show early evidence that our AI assurance framework is sound and comprehensive, usable across different organizational contexts.
- Score: 16.53658640529767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As Artificial Intelligence (AI) systems proliferate, the need for systematic, transparent, and actionable processes for evaluating them is growing. While many resources exist to support AI evaluation, they have several limitations. Few address both the process of designing, developing, and deploying an AI system and the outcomes it produces. Furthermore, few are end-to-end and operational, give actionable guidance, or present evidence of usability or effectiveness in practice. In this paper, we introduce a third-party AI assurance framework that addresses these gaps. We focus on third-party assurance to prevent conflict of interest and ensure credibility and accountability of the process. We begin by distinguishing assurance from audits in several key dimensions. Then, following design principles, we reflect on the shortcomings of existing resources to identify a set of design requirements for AI assurance. We then construct a prototype of an assurance process that consists of (1) a responsibility assignment matrix to determine the different levels of involvement each stakeholder has at each stage of the AI lifecycle, (2) an interview protocol for each stakeholder of an AI system, (3) a maturity matrix to assess AI systems' adherence to best practices, and (4) a template for an assurance report that draws from more mature assurance practices in business accounting. We conduct early validation of our AI assurance framework by applying the framework to two distinct AI use cases -- a business document tagging tool for downstream processing in a large private firm, and a housing resource allocation tool in a public agency -- and conducting expert validation interviews. Our findings show early evidence that our AI assurance framework is sound and comprehensive, usable across different organizational contexts, and effective at identifying bespoke issues with AI systems.
Related papers
- Responsible AI in Business [0.8213113085481418]
It structures Responsible AI along four focal areas that are central for introducing and operating AI systems in a legally compliant, comprehensible, sustainable, and data-sovereign manner.<n>First, it discusses the EU AI Act as a risk-based regulatory framework, including the distinction between provider and deployer roles.<n>Second, it addresses Explainable AI as a basis for transparency and trust, clarifying key notions such as transparency, interpretability, and explainability.<n>Third, it covers Green AI, emphasizing that AI systems should be evaluated not only by performance but also by energy and resource consumption.
arXiv Detail & Related papers (2026-01-31T08:24:20Z) - AI Deception: Risks, Dynamics, and Controls [153.71048309527225]
This project provides a comprehensive and up-to-date overview of the AI deception field.<n>We identify a formal definition of AI deception, grounded in signaling theory from studies of animal deception.<n>We organize the landscape of AI deception research as a deception cycle, consisting of two key components: deception emergence and deception treatment.
arXiv Detail & Related papers (2025-11-27T16:56:04Z) - Barbarians at the Gate: How AI is Upending Systems Research [58.95406995634148]
We argue that systems research, long focused on designing and evaluating new performance-oriented algorithms, is particularly well-suited for AI-driven solution discovery.<n>We term this approach as AI-Driven Research for Systems ( ADRS), which iteratively generates, evaluates, and refines solutions.<n>Our results highlight both the disruptive potential and the urgent need to adapt systems research practices in the age of AI.
arXiv Detail & Related papers (2025-10-07T17:49:24Z) - Safe and Certifiable AI Systems: Concepts, Challenges, and Lessons Learned [45.44933002008943]
This white paper presents the T"UV AUSTRIA Trusted AI framework.<n>It is an end-to-end audit catalog and methodology for assessing and certifying machine learning systems.<n>Building on three pillars - Secure Software Development, Functional Requirements, and Ethics & Data Privacy - it translates the high-level obligations of the EU AI Act into specific, testable criteria.
arXiv Detail & Related papers (2025-09-08T17:52:08Z) - Never Compromise to Vulnerabilities: A Comprehensive Survey on AI Governance [211.5823259429128]
We propose a comprehensive framework integrating technical and societal dimensions, structured around three interconnected pillars: Intrinsic Security, Derivative Security, and Social Ethics.<n>We identify three core challenges: (1) the generalization gap, where defenses fail against evolving threats; (2) inadequate evaluation protocols that overlook real-world risks; and (3) fragmented regulations leading to inconsistent oversight.<n>Our framework offers actionable guidance for researchers, engineers, and policymakers to develop AI systems that are not only robust and secure but also ethically aligned and publicly trustworthy.
arXiv Detail & Related papers (2025-08-12T09:42:56Z) - A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems [53.37728204835912]
Most existing AI systems rely on manually crafted configurations that remain static after deployment.<n>Recent research has explored agent evolution techniques that aim to automatically enhance agent systems based on interaction data and environmental feedback.<n>This survey aims to provide researchers and practitioners with a systematic understanding of self-evolving AI agents.
arXiv Detail & Related papers (2025-08-10T16:07:32Z) - TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems [8.683314804719506]
This review presents a structured analysis of Trust, Risk, and Security Management (TRiSM) in the context of Agentic Multi-Agent Systems (AMAS)<n>We begin by examining the conceptual foundations of Agentic AI and highlight its architectural distinctions from traditional AI agents.<n>We then adapt and extend the AI TRiSM framework for Agentic AI, structured around key pillars: textit Explainability, ModelOps, Security, Privacy and textittheir lifecycle governance<n>A risk taxonomy is proposed to capture the unique threats and vulnerabilities of Agentic AI, ranging from coordination failures to
arXiv Detail & Related papers (2025-06-04T16:26:11Z) - A Framework for the Assurance of AI-Enabled Systems [0.0]
This paper proposes a claims-based framework for risk management and assurance of AI systems.<n>The paper's contributions are a framework process for AI assurance, a set of relevant definitions, and a discussion of important considerations in AI assurance.
arXiv Detail & Related papers (2025-04-03T13:44:01Z) - Design of a Quality Management System based on the EU Artificial Intelligence Act [0.0]
The EU AI Act mandates that providers and deployers of high-risk AI systems establish a quality management system (QMS)
This paper introduces a new design concept and prototype for a QMS as a microservice Software as a Service web application.
arXiv Detail & Related papers (2024-08-08T12:14:02Z) - Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems [88.80306881112313]
We will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI.
The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees.
We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them.
arXiv Detail & Related papers (2024-05-10T17:38:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.