Trustworthy Agentic AI Requires Deterministic Architectural Boundaries
- URL: http://arxiv.org/abs/2602.09947v1
- Date: Tue, 10 Feb 2026 16:33:40 GMT
- Title: Trustworthy Agentic AI Requires Deterministic Architectural Boundaries
- Authors: Manish Bhattarai, Minh Vu,
- Abstract summary: Current agentic AI architectures are fundamentally incompatible with the security and requirements of high-stakes scientific domains.<n>We introduce the Trinity Defense Architecture, which enforces security through three mechanisms.<n>We show that without unforgeable provenance and deterministic mediation, the Lethal Trifecta'' (untrusted inputs, privileged data access, external action capability) turns authorization security into an exploit-discovery problem.
- Score: 2.378211191937908
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current agentic AI architectures are fundamentally incompatible with the security and epistemological requirements of high-stakes scientific workflows. The problem is not inadequate alignment or insufficient guardrails, it is architectural: autoregressive language models process all tokens uniformly, making deterministic command--data separation unattainable through training alone. We argue that deterministic, architectural enforcement, not probabilistic learned behavior, is a necessary condition for trustworthy AI-assisted science. We introduce the Trinity Defense Architecture, which enforces security through three mechanisms: action governance via a finite action calculus with reference-monitor enforcement, information-flow control via mandatory access labels preventing cross-scope leakage, and privilege separation isolating perception from execution. We show that without unforgeable provenance and deterministic mediation, the ``Lethal Trifecta'' (untrusted inputs, privileged data access, external action capability) turns authorization security into an exploit-discovery problem: training-based defenses may reduce empirical attack rates but cannot provide deterministic guarantees. The ML community must recognize that alignment is insufficient for authorization security, and that architectural mediation is required before agentic AI can be safely deployed in consequential scientific domains.
Related papers
- Extending the Formalism and Theoretical Foundations of Cryptography to AI [18.724847875398435]
Recent progress in (Large) Language Models has enabled the development of autonomous LM-based agents.<n>One emerging direction to mitigate security risks is to constrain agent behaviours via access control and permissioning mechanisms.<n>We first systematize the landscape by constructing an attack taxonomy tailored to language models.<n>We then develop a formal treatment of agentic access control by defining an AIOracle algorithmically and introducing a security-game framework.
arXiv Detail & Related papers (2026-03-03T04:11:21Z) - Incentive-Aware AI Safety via Strategic Resource Allocation: A Stackelberg Security Games Perspective [31.55000083809067]
We show how game-theoretic deterrence can make AI oversight proactive, risk-aware, and resilient to manipulation.<n>We illustrate how this framework can inform (1) training-time auditing against data/feedback poisoning, (2) pre-deployment evaluation under constrained reviewer resources, and (3) robust multi-model deployment in adversarial environments.
arXiv Detail & Related papers (2026-02-06T23:20:26Z) - CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents [60.98294016925157]
AI agents are vulnerable to prompt injection attacks, where malicious content hijacks agent behavior to steal credentials or cause financial loss.<n>We introduce Single-Shot Planning for CUAs, where a trusted planner generates a complete execution graph with conditional branches before any observation of potentially malicious content.<n>Although this architectural isolation successfully prevents instruction injections, we show that additional measures are needed to prevent Branch Steering attacks.
arXiv Detail & Related papers (2026-01-14T23:06:35Z) - AI Deception: Risks, Dynamics, and Controls [153.71048309527225]
This project provides a comprehensive and up-to-date overview of the AI deception field.<n>We identify a formal definition of AI deception, grounded in signaling theory from studies of animal deception.<n>We organize the landscape of AI deception research as a deception cycle, consisting of two key components: deception emergence and deception treatment.
arXiv Detail & Related papers (2025-11-27T16:56:04Z) - MAIF: Enforcing AI Trust and Provenance with an Artifact-Centric Agentic Paradigm [0.5495755145898128]
Current AI systems operate on opaque data structures that lack the audit trails, provenance tracking, or explainability required by emerging regulations like the EU AI Act.<n>We propose an artifact-centric AI agent paradigm where behavior is driven by persistent, verifiable data artifacts rather than ephemeral tasks.<n>Production-ready implementation demonstrates ultra-high-speed streaming (2,720.7 MB/s), optimized video processing (1,342 MB/s), and enterprise-grade security.
arXiv Detail & Related papers (2025-11-19T04:10:32Z) - Governable AI: Provable Safety Under Extreme Threat Models [31.36879992618843]
We propose a Governable AI (GAI) framework that shifts from traditional internal constraints to externally enforced structural compliance.<n>The GAI framework is composed of a simple yet reliable, fully deterministic, powerful, flexible, and general-purpose rule enforcement module (REM); governance rules; and a governable secure super-platform (GSSP) that offers end-to-end protection against compromise or subversion by AI.
arXiv Detail & Related papers (2025-08-28T04:22:59Z) - CIA+TA Risk Assessment for AI Reasoning Vulnerabilities [0.0]
We present a framework for cognitive cybersecurity, a systematic protection of AI reasoning processes from adversarial manipulation.<n>First, we establish cognitive cybersecurity as a discipline complementing traditional cybersecurity and AI safety.<n>Second, we introduce the CIA+TA, extending traditional Confidentiality, Integrity, and Availability with Trust.<n>Third, we present a quantitative risk assessment methodology with empirically-derived coefficients, enabling organizations to measure cognitive security risks.
arXiv Detail & Related papers (2025-08-19T13:56:09Z) - Never Compromise to Vulnerabilities: A Comprehensive Survey on AI Governance [211.5823259429128]
We propose a comprehensive framework integrating technical and societal dimensions, structured around three interconnected pillars: Intrinsic Security, Derivative Security, and Social Ethics.<n>We identify three core challenges: (1) the generalization gap, where defenses fail against evolving threats; (2) inadequate evaluation protocols that overlook real-world risks; and (3) fragmented regulations leading to inconsistent oversight.<n>Our framework offers actionable guidance for researchers, engineers, and policymakers to develop AI systems that are not only robust and secure but also ethically aligned and publicly trustworthy.
arXiv Detail & Related papers (2025-08-12T09:42:56Z) - Rethinking Data Protection in the (Generative) Artificial Intelligence Era [138.07763415496288]
We propose a four-level taxonomy that captures the diverse protection needs arising in modern (generative) AI models and systems.<n>Our framework offers a structured understanding of the trade-offs between data utility and control, spanning the entire AI pipeline.
arXiv Detail & Related papers (2025-07-03T02:45:51Z) - Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems [88.80306881112313]
We will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI.
The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees.
We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them.
arXiv Detail & Related papers (2024-05-10T17:38:32Z) - Trustworthy AI [75.99046162669997]
Brittleness to minor adversarial changes in the input data, ability to explain the decisions, address the bias in their training data, are some of the most prominent limitations.
We propose the tutorial on Trustworthy AI to address six critical issues in enhancing user and public trust in AI systems.
arXiv Detail & Related papers (2020-11-02T20:04:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.