Securing AI Agents with Information-Flow Control
- URL: http://arxiv.org/abs/2505.23643v2
- Date: Wed, 03 Sep 2025 12:30:47 GMT
- Title: Securing AI Agents with Information-Flow Control
- Authors: Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, Santiago Zanella-Béguelin,
- Abstract summary: This paper explores the use of information-flow control (IFC) to provide security guarantees for AI agents.<n>We present a formal model to reason about the security and expressiveness of agent planners.<n>We present Fides, a planner that tracks confidentiality and integrity labels, deterministically enforces security policies, and introduces novel primitives for selectively hiding information.
- Score: 16.62563273152804
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As AI agents become increasingly autonomous and capable, ensuring their security against vulnerabilities such as prompt injection becomes critical. This paper explores the use of information-flow control (IFC) to provide security guarantees for AI agents. We present a formal model to reason about the security and expressiveness of agent planners. Using this model, we characterize the class of properties enforceable by dynamic taint-tracking and construct a taxonomy of tasks to evaluate security and utility trade-offs of planner designs. Informed by this exploration, we present Fides, a planner that tracks confidentiality and integrity labels, deterministically enforces security policies, and introduces novel primitives for selectively hiding information. Its evaluation in AgentDojo demonstrates that this approach enables us to complete a broad range of tasks with security guarantees. A tutorial to walk readers through the the concepts introduced in the paper can be found at https://github.com/microsoft/fides
Related papers
- CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents [60.98294016925157]
AI agents are vulnerable to prompt injection attacks, where malicious content hijacks agent behavior to steal credentials or cause financial loss.<n>We introduce Single-Shot Planning for CUAs, where a trusted planner generates a complete execution graph with conditional branches before any observation of potentially malicious content.<n>Although this architectural isolation successfully prevents instruction injections, we show that additional measures are needed to prevent Branch Steering attacks.
arXiv Detail & Related papers (2026-01-14T23:06:35Z) - Towards Verifiably Safe Tool Use for LLM Agents [53.55621104327779]
Large language model (LLM)-based AI agents extend capabilities by enabling access to tools such as data sources, APIs, search engines, code sandboxes, and even other agents.<n>LLMs may invoke unintended tool interactions and introduce risks, such as leaking sensitive data or overwriting critical records.<n>Current approaches to mitigate these risks, such as model-based safeguards, enhance agents' reliability but cannot guarantee system safety.
arXiv Detail & Related papers (2026-01-12T21:31:38Z) - OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows [77.95511352806261]
Computer-using agents powered by Vision-Language Models (VLMs) have demonstrated human-like capabilities in operating digital environments like mobile platforms.<n>We propose OS-Sentinel, a novel hybrid safety detection framework that combines a Formal Verifier for detecting explicit system-level violations with a Contextual Judge for assessing contextual risks and agent actions.
arXiv Detail & Related papers (2025-10-28T13:22:39Z) - OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety [58.201189860217724]
We introduce OpenAgentSafety, a comprehensive framework for evaluating agent behavior across eight critical risk categories.<n>Unlike prior work, our framework evaluates agents that interact with real tools, including web browsers, code execution environments, file systems, bash shells, and messaging platforms.<n>It combines rule-based analysis with LLM-as-judge assessments to detect both overt and subtle unsafe behaviors.
arXiv Detail & Related papers (2025-07-08T16:18:54Z) - LLM Agents Should Employ Security Principles [60.03651084139836]
This paper argues that the well-established design principles in information security should be employed when deploying Large Language Model (LLM) agents at scale.<n>We introduce AgentSandbox, a conceptual framework embedding these security principles to provide safeguards throughout an agent's life-cycle.
arXiv Detail & Related papers (2025-05-29T21:39:08Z) - Automating Safety Enhancement for LLM-based Agents with Synthetic Risk Scenarios [77.86600052899156]
Large Language Model (LLM)-based agents are increasingly deployed in real-world applications.<n>We propose AutoSafe, the first framework that systematically enhances agent safety through fully automated synthetic data generation.<n>We show that AutoSafe boosts safety scores by 45% on average and achieves a 28.91% improvement on real-world tasks.
arXiv Detail & Related papers (2025-05-23T10:56:06Z) - The Hidden Dangers of Browsing AI Agents [0.0]
This paper presents a comprehensive security evaluation of such agents, focusing on systemic vulnerabilities across multiple architectural layers.<n>Our work outlines the first end-to-end threat model for browsing agents and provides actionable guidance for securing their deployment in real-world environments.
arXiv Detail & Related papers (2025-05-19T13:10:29Z) - A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron? [30.063392019347887]
We present a systematization of knowledge on the safety and security threats of emphComputer-Using Agents.<n> CUAs are capable of autonomously performing tasks such as navigating desktop applications, web pages, and mobile apps.
arXiv Detail & Related papers (2025-05-16T06:56:42Z) - Towards Trustworthy GUI Agents: A Survey [64.6445117343499]
This survey examines the trustworthiness of GUI agents in five critical dimensions.<n>We identify major challenges such as vulnerability to adversarial attacks, cascading failure modes in sequential decision-making.<n>As GUI agents become more widespread, establishing robust safety standards and responsible development practices is essential.
arXiv Detail & Related papers (2025-03-30T13:26:00Z) - AgentOps: Enabling Observability of LLM Agents [12.49728300301026]
Large language model (LLM) agents raise significant concerns on AI safety due to their autonomous and non-deterministic behavior.<n>We present a comprehensive taxonomy of AgentOps, identifying the artifacts and associated data that should be traced throughout the entire lifecycle of agents to achieve effective observability.<n>Our taxonomy serves as a reference template for developers to design and implement AgentOps infrastructure that supports monitoring, logging, and analytics.
arXiv Detail & Related papers (2024-11-08T02:31:03Z) - Agent-as-a-Judge: Evaluate Agents with Agents [61.33974108405561]
We introduce the Agent-as-a-Judge framework, wherein agentic systems are used to evaluate agentic systems.
This is an organic extension of the LLM-as-a-Judge framework, incorporating agentic features that enable intermediate feedback for the entire task-solving process.
We present DevAI, a new benchmark of 55 realistic automated AI development tasks.
arXiv Detail & Related papers (2024-10-14T17:57:02Z) - Security of AI Agents [5.468745160706382]
We identify and describe potential vulnerabilities in AI agents in detail from a system security perspective.<n>We introduce defense mechanisms corresponding to each vulnerability with design and experiments to evaluate their viability.<n>This paper contextualizes the security issues in the current development of AI agents and delineates methods to make AI agents safer and more reliable.
arXiv Detail & Related papers (2024-06-12T23:16:45Z) - TrustAgent: Towards Safe and Trustworthy LLM-based Agents [50.33549510615024]
This paper presents an Agent-Constitution-based agent framework, TrustAgent, with a focus on improving the LLM-based agent safety.
The proposed framework ensures strict adherence to the Agent Constitution through three strategic components: pre-planning strategy which injects safety knowledge to the model before plan generation, in-planning strategy which enhances safety during plan generation, and post-planning strategy which ensures safety by post-planning inspection.
arXiv Detail & Related papers (2024-02-02T17:26:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.