Related papers: Institutional AI: A Governance Framework for Distributional AGI Safety

Institutional AI: A Governance Framework for Distributional AGI Safety

URL: http://arxiv.org/abs/2601.10599v2
Date: Mon, 19 Jan 2026 14:18:44 GMT
Title: Institutional AI: A Governance Framework for Distributional AGI Safety
Authors: Federico Pierucci, Marcello Galisai, Marcantonio Syrnikov Bracale, Matteo Prandi, Piercosma Bisconti, Francesco Giarrusso, Olga Sorokoletova, Vincenzo Suriani, Daniele Nardi,
Abstract summary: We identify three structural problems that emerge from core properties of AI models.<n>The solution is Institutional AI, a system-level approach that treats alignment as a question of effective governance of AI agent collectives.
Score: 1.3763052684269788
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As LLM-based systems increasingly operate as agents embedded within human social and technical systems, alignment can no longer be treated as a property of an isolated model, but must be understood in relation to the environments in which these agents act. Even the most sophisticated methods of alignment, such as Reinforcement Learning through Human Feedback (RHLF) or through AI Feedback (RLAIF) cannot ensure control once internal goal structures diverge from developer intent. We identify three structural problems that emerge from core properties of AI models: (1) behavioral goal-independence, where models develop internal objectives and misgeneralize goals; (2) instrumental override of natural-language constraints, where models regard safety principles as non-binding while pursuing latent objectives, leveraging deception and manipulation; and (3) agentic alignment drift, where individually aligned agents converge to collusive equilibria through interaction dynamics invisible to single-agent audits. The solution this paper advances is Institutional AI: a system-level approach that treats alignment as a question of effective governance of AI agent collectives. We argue for a governance-graph that details how to constrain agents via runtime monitoring, incentive shaping through prizes and sanctions, explicit norms and enforcement roles. This institutional turn reframes safety from software engineering to a mechanism design problem, where the primary goal of alignment is shifting the payoff landscape of AI agent collectives.

Related papers

From Prompt-Response to Goal-Directed Systems: The Evolution of Agentic AI Software Architecture [0.0]
Agentic AI denotes an architectural transition from stateless, prompt-driven generative models toward goal-directed systems.<n>This paper examines this transition by connecting intelligent agent theories, with contemporary LLM-centric approaches.<n>The study identifies a convergence toward standardized agent loops, registries, and auditable control mechanisms.
arXiv Detail & Related papers (2026-02-11T03:34:48Z)
EmboCoach-Bench: Benchmarking AI Agents on Developing Embodied Robots [68.29056647487519]
Embodied AI is fueled by high-fidelity simulation and large-scale data collection.<n>However, this scaling capability remains bottlenecked by a reliance on labor-intensive manual oversight.<n>We introduce textscEmboCoach-Bench, a benchmark evaluating the capacity of LLM agents to autonomously engineer embodied policies.
arXiv Detail & Related papers (2026-01-29T11:33:49Z)
Interpreting Agentic Systems: Beyond Model Explanations to System-Level Accountability [0.6745502291821954]
Agentic systems have transformed how Large Language Models can be leveraged to create autonomous systems with goal-directed behaviors.<n>Current interpretability techniques, developed primarily for static models, show limitations when applied to agentic systems.<n>This paper assesses the suitability and limitations of existing interpretability methods in the context of agentic systems.
arXiv Detail & Related papers (2026-01-23T21:05:32Z)
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems [53.37728204835912]
Most existing AI systems rely on manually crafted configurations that remain static after deployment.<n>Recent research has explored agent evolution techniques that aim to automatically enhance agent systems based on interaction data and environmental feedback.<n>This survey aims to provide researchers and practitioners with a systematic understanding of self-evolving AI agents.
arXiv Detail & Related papers (2025-08-10T16:07:32Z)
Toward a Theory of Agents as Tool-Use Decision-Makers [89.26889709510242]
We argue that true autonomy requires agents to be grounded in a coherent epistemic framework that governs what they know, what they need to know, and how to acquire that knowledge efficiently.<n>We propose a unified theory that treats internal reasoning and external actions as equivalent epistemic tools, enabling agents to systematically coordinate introspection and interaction.<n>This perspective shifts the design of agents from mere action executors to knowledge-driven intelligence systems, offering a principled path toward building foundation agents capable of adaptive, efficient, and goal-directed behavior.
arXiv Detail & Related papers (2025-06-01T07:52:16Z)
Internet of Agents: Fundamentals, Applications, and Challenges [68.9543153075464]
We introduce the Internet of Agents (IoA) as a foundational framework that enables seamless interconnection, dynamic discovery, and collaborative orchestration among heterogeneous agents at scale.<n>We analyze the key operational enablers of IoA, including capability notification and discovery, adaptive communication protocols, dynamic task matching, consensus and conflict-resolution mechanisms, and incentive models.
arXiv Detail & Related papers (2025-05-12T02:04:37Z)
Threat Modeling for AI: The Case for an Asset-Centric Approach [0.23408308015481666]
AI systems now able to autonomously execute code, interact with external systems, and operate without human oversight.<n>With AI systems now able to autonomously execute code, interact with external systems, and operate without human oversight, traditional security approaches fall short.<n>This paper introduces an asset-centric methodology for threat modeling AI systems.
arXiv Detail & Related papers (2025-05-08T18:57:08Z)
Human-AI Governance (HAIG): A Trust-Utility Approach [0.0]
This paper introduces the HAIG framework for analysing trust dynamics across evolving human-AI relationships.<n>Our analysis reveals how technical advances in self-supervision, reasoning authority, and distributed decision-making drive non-uniform trust evolution.
arXiv Detail & Related papers (2025-05-03T01:57:08Z)
Do LLMs trust AI regulation? Emerging behaviour of game-theoretic LLM agents [61.132523071109354]
This paper investigates the interplay between AI developers, regulators and users, modelling their strategic choices under different regulatory scenarios.<n>Our research identifies emerging behaviours of strategic AI agents, which tend to adopt more "pessimistic" stances than pure game-theoretic agents.
arXiv Detail & Related papers (2025-04-11T15:41:21Z)
Media and responsible AI governance: a game-theoretic and LLM analysis [61.132523071109354]
This paper investigates the interplay between AI developers, regulators, users, and the media in fostering trustworthy AI systems.<n>Using evolutionary game theory and large language models (LLMs), we model the strategic interactions among these actors under different regulatory regimes.
arXiv Detail & Related papers (2025-03-12T21:39:38Z)
Position: Emergent Machina Sapiens Urge Rethinking Multi-Agent Paradigms [8.177915265718703]
We argue that AI agents should be empowered to adjust their objectives dynamically.<n>We call for a shift toward the emergent, self-organizing, and context-aware nature of these multi-agentic AI systems.
arXiv Detail & Related papers (2025-02-05T22:20:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.