Oversight Structures for Agentic AI in Public-Sector Organizations
- URL: http://arxiv.org/abs/2506.04836v1
- Date: Thu, 05 Jun 2025 09:57:15 GMT
- Title: Oversight Structures for Agentic AI in Public-Sector Organizations
- Authors: Chris Schmitz, Jonathan Rystrøm, Jan Batzner,
- Abstract summary: We identify five governance dimensions essential for responsible agent deployment.<n>We find that agent oversight poses intensified versions of three existing governance challenges.<n>We propose approaches that both adapt institutional structures and design agent oversight compatible with public sector constraints.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper finds that the introduction of agentic AI systems intensifies existing challenges to traditional public sector oversight mechanisms -- which rely on siloed compliance units and episodic approvals rather than continuous, integrated supervision. We identify five governance dimensions essential for responsible agent deployment: cross-departmental implementation, comprehensive evaluation, enhanced security protocols, operational visibility, and systematic auditing. We evaluate the capacity of existing oversight structures to meet these challenges, via a mixed-methods approach consisting of a literature review and interviews with civil servants in AI-related roles. We find that agent oversight poses intensified versions of three existing governance challenges: continuous oversight, deeper integration of governance and operational capabilities, and interdepartmental coordination. We propose approaches that both adapt institutional structures and design agent oversight compatible with public sector constraints.
Related papers
- The Controllability Trap: A Governance Framework for Military AI Agents [0.0]
We propose the Agentic Military AI Governance Framework (AMAGF)<n>AMAGF is a measurable architecture structured around three pillars: Preventive Governance, Detective Governance, and Corrective Governance.<n>Its core mechanism, the Control Quality Score (CQS), is a composite real-time metric quantifying human control and enabling graduated responses as control weakens.
arXiv Detail & Related papers (2026-03-03T20:48:01Z) - From Prompt-Response to Goal-Directed Systems: The Evolution of Agentic AI Software Architecture [0.0]
Agentic AI denotes an architectural transition from stateless, prompt-driven generative models toward goal-directed systems.<n>This paper examines this transition by connecting intelligent agent theories, with contemporary LLM-centric approaches.<n>The study identifies a convergence toward standardized agent loops, registries, and auditable control mechanisms.
arXiv Detail & Related papers (2026-02-11T03:34:48Z) - Adaptation of Agentic AI [162.63072848575695]
We unify the rapidly expanding research landscape into a systematic framework that spans both agent adaptations and tool adaptations.<n>We demonstrate that this framework helps clarify the design space of adaptation strategies in agentic AI.<n>We then review the representative approaches in each category, analyze their strengths and limitations, and highlight key open challenges and future opportunities.
arXiv Detail & Related papers (2025-12-18T08:38:51Z) - GAIA: A General Agency Interaction Architecture for LLM-Human B2B Negotiation & Screening [6.868155877660834]
We propose GAIA, a governance-first framework for LLM-human agency in B2B negotiation and screening.<n>GAIA defines three essential roles - Principal (human), Delegate (LLM agent), and Counterparty - with an optional Critic to enhance performance.<n>Our contributions are fourfold: (1) a formal governance framework with three coordinated mechanisms and four safety invariants for delegation with bounded authorization; (2) information-gated progression via task-completeness tracking (TCI) and explicit state transitions that separate screening from commitment; and (3) dual feedback integration that blends Critic suggestions with human oversight through parallel learning channels.
arXiv Detail & Related papers (2025-11-09T07:41:49Z) - Never Compromise to Vulnerabilities: A Comprehensive Survey on AI Governance [211.5823259429128]
We propose a comprehensive framework integrating technical and societal dimensions, structured around three interconnected pillars: Intrinsic Security, Derivative Security, and Social Ethics.<n>We identify three core challenges: (1) the generalization gap, where defenses fail against evolving threats; (2) inadequate evaluation protocols that overlook real-world risks; and (3) fragmented regulations leading to inconsistent oversight.<n>Our framework offers actionable guidance for researchers, engineers, and policymakers to develop AI systems that are not only robust and secure but also ethically aligned and publicly trustworthy.
arXiv Detail & Related papers (2025-08-12T09:42:56Z) - Web3 x AI Agents: Landscape, Integrations, and Foundational Challenges [49.69200207497795]
The convergence of Web3 technologies and AI agents represents a rapidly evolving frontier poised to reshape decentralized ecosystems.<n>This paper presents the first and most comprehensive analysis of the intersection between Web3 and AI agents, examining five critical dimensions: landscape, economics, governance, security, and trust mechanisms.
arXiv Detail & Related papers (2025-08-04T15:44:58Z) - TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems [2.462408812529728]
This review presents a structured analysis of textbfTrust, Risk, and Security Management (TRiSM) in the context of LLM-based Agentic Multi-Agent Systems (AMAS)<n>We begin by examining the conceptual foundations of Agentic AI and highlight its architectural distinctions from traditional AI agents.<n>We then adapt and extend the AI TRiSM framework for Agentic AI, structured around four key pillars: Explainability, ModelOps, Security, Privacy and Governance.
arXiv Detail & Related papers (2025-06-04T16:26:11Z) - AI-Supported Platform for System Monitoring and Decision-Making in Nuclear Waste Management with Large Language Models [1.6795461001108096]
This paper presents a multi-agent Retrieval-Augmented Generation (RAG) system that integrates large language models (LLMs) with document retrieval mechanisms.<n>The system ensures regulatory decisions remain factually grounded, dynamically adapting to evolving regulatory frameworks.
arXiv Detail & Related papers (2025-05-27T20:29:53Z) - Internet of Agents: Fundamentals, Applications, and Challenges [66.44234034282421]
We introduce the Internet of Agents (IoA) as a foundational framework that enables seamless interconnection, dynamic discovery, and collaborative orchestration among heterogeneous agents at scale.<n>We analyze the key operational enablers of IoA, including capability notification and discovery, adaptive communication protocols, dynamic task matching, consensus and conflict-resolution mechanisms, and incentive models.
arXiv Detail & Related papers (2025-05-12T02:04:37Z) - Advancing Multi-Agent Systems Through Model Context Protocol: Architecture, Implementation, and Applications [0.0]
This paper introduces a comprehensive framework for advancing multi-agent systems through Model Context Protocol (MCP)<n>We extend previous work on AI agent architectures by developing a unified theoretical foundation, advanced context management techniques, and scalable coordination patterns.<n>We identify current limitations, emerging research opportunities, and potential transformative applications across industries.
arXiv Detail & Related papers (2025-04-26T03:43:03Z) - A Desideratum for Conversational Agents: Capabilities, Challenges, and Future Directions [51.96890647837277]
Large Language Models (LLMs) have propelled conversational AI from traditional dialogue systems into sophisticated agents capable of autonomous actions, contextual awareness, and multi-turn interactions with users.<n>This survey paper presents a desideratum for next-generation Conversational Agents - what has been achieved, what challenges persist, and what must be done for more scalable systems that approach human-level intelligence.
arXiv Detail & Related papers (2025-04-07T21:01:25Z) - Inherent and emergent liability issues in LLM-based agentic systems: a principal-agent perspective [0.0]
Agentic systems powered by large language models (LLMs) are becoming progressively more complex and capable.<n>Their increasing agency and expanding deployment settings attract growing attention to effective governance policies, monitoring, and control protocols.<n>We analyze potential liability issues arising from the delegated use of LLM agents and their extended systems through a principal-agent perspective.
arXiv Detail & Related papers (2025-04-04T08:10:02Z) - In-House Evaluation Is Not Enough: Towards Robust Third-Party Flaw Disclosure for General-Purpose AI [93.33036653316591]
We call for three interventions to advance system safety.<n>First, we propose using standardized AI flaw reports and rules of engagement for researchers.<n>Second, we propose GPAI system providers adopt broadly-scoped flaw disclosure programs.<n>Third, we advocate for the development of improved infrastructure to coordinate distribution of flaw reports.
arXiv Detail & Related papers (2025-03-21T05:09:46Z) - Media and responsible AI governance: a game-theoretic and LLM analysis [61.132523071109354]
This paper investigates the interplay between AI developers, regulators, users, and the media in fostering trustworthy AI systems.<n>Using evolutionary game theory and large language models (LLMs), we model the strategic interactions among these actors under different regulatory regimes.
arXiv Detail & Related papers (2025-03-12T21:39:38Z) - AgentOrca: A Dual-System Framework to Evaluate Language Agents on Operational Routine and Constraint Adherence [54.317522790545304]
We present AgentOrca, a dual-system framework for evaluating language agents' compliance with operational constraints and routines.<n>Our framework encodes action constraints and routines through both natural language prompts for agents and corresponding executable code serving as ground truth for automated verification.<n>Our findings reveal notable performance gaps among state-of-the-art models, with large reasoning models like o1 demonstrating superior compliance while others show significantly lower performance.
arXiv Detail & Related papers (2025-03-11T17:53:02Z) - Agent-as-a-Judge: Evaluate Agents with Agents [61.33974108405561]
We introduce the Agent-as-a-Judge framework, wherein agentic systems are used to evaluate agentic systems.
This is an organic extension of the LLM-as-a-Judge framework, incorporating agentic features that enable intermediate feedback for the entire task-solving process.
We present DevAI, a new benchmark of 55 realistic automated AI development tasks.
arXiv Detail & Related papers (2024-10-14T17:57:02Z) - Visibility into AI Agents [9.067567737098594]
Increased delegation of commercial, scientific, governmental, and personal activities to AI agents may exacerbate existing societal risks.
We assess three categories of measures to increase visibility into AI agents: agent identifiers, real-time monitoring, and activity logging.
arXiv Detail & Related papers (2024-01-23T23:18:33Z) - Ask-AC: An Initiative Advisor-in-the-Loop Actor-Critic Framework [41.04606578479283]
We introduce a novel initiative advisor-in-the-loop actor-critic framework, termed as Ask-AC.
At the heart of Ask-AC are two complementary components, namely action requester and adaptive state selector.
Experimental results on both stationary and non-stationary environments demonstrate that the proposed framework significantly improves the learning efficiency of the agent.
arXiv Detail & Related papers (2022-07-05T10:58:11Z) - Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable
Claims [59.64274607533249]
AI developers need to make verifiable claims to which they can be held accountable.
This report suggests various steps that different stakeholders can take to improve the verifiability of claims made about AI systems.
We analyze ten mechanisms for this purpose--spanning institutions, software, and hardware--and make recommendations aimed at implementing, exploring, or improving those mechanisms.
arXiv Detail & Related papers (2020-04-15T17:15:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.