Related papers: Safe, Untrusted, "Proof-Carrying" AI Agents: toward the agentic lakehouse

Related papers

Tracking Capabilities for Safer Agents [2.9897366166831265]
Instead of calling tools directly, agents express their intentions as code in a capability-safe language: Scala 3 with capture checking.<n> Scala's type system tracks capabilities statically, providing fine-grained control over what an agent can do.<n>Our experiments show that agents can generate capability-safe code with no significant loss in task performance.
arXiv Detail & Related papers (2026-03-01T08:39:37Z)
AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security [126.49733412191416]
Current guardrail models lack agentic risk awareness and transparency in risk diagnosis.<n>We propose a unified three-dimensional taxonomy that categorizes agentic risks by their source (where), failure mode (how), and consequence (what)<n>We introduce a new fine-grained agentic safety benchmark (ATBench) and a Diagnostic Guardrail framework for agent safety and security (AgentDoG)
arXiv Detail & Related papers (2026-01-26T13:45:41Z)
CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents [60.98294016925157]
AI agents are vulnerable to prompt injection attacks, where malicious content hijacks agent behavior to steal credentials or cause financial loss.<n>We introduce Single-Shot Planning for CUAs, where a trusted planner generates a complete execution graph with conditional branches before any observation of potentially malicious content.<n>Although this architectural isolation successfully prevents instruction injections, we show that additional measures are needed to prevent Branch Steering attacks.
arXiv Detail & Related papers (2026-01-14T23:06:35Z)
Trustworthy AI in the Agentic Lakehouse: from Concurrency to Governance [5.3013727160110085]
We argue that the path to trustworthy agentic begins with solving the infrastructure problem first.<n>We propose an agent-first design, Bauplan, that re-implements data and compute isolation in the lakehouse.<n>We conclude by sharing a reference implementation of a self-healing pipeline in Bauplan.
arXiv Detail & Related papers (2025-11-20T14:21:34Z)
Impatient Users Confuse AI Agents: High-fidelity Simulations of Human Traits for Testing Agents [58.00130492861884]
TraitBasis is a lightweight, model-agnostic method for systematically stress testing AI agents.<n>TraitBasis learns directions in activation space corresponding to steerable user traits.<n>We observe on average a 2%-30% performance degradation on $tau$-Trait across frontier models.
arXiv Detail & Related papers (2025-10-06T05:03:57Z)
Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain [82.98626829232899]
Fine-tuning AI agents on data from their own interactions introduces a critical security vulnerability within the AI supply chain.<n>We show that adversaries can easily poison the data collection pipeline to embed hard-to-detect backdoors.
arXiv Detail & Related papers (2025-10-03T12:47:21Z)
Securing AI Agents with Information-Flow Control [16.62563273152804]
This paper explores the use of information-flow control (IFC) to provide security guarantees for AI agents.<n>We present a formal model to reason about the security and expressiveness of agent planners.<n>We present Fides, a planner that tracks confidentiality and integrity labels, deterministically enforces security policies, and introduces novel primitives for selectively hiding information.
arXiv Detail & Related papers (2025-05-29T16:50:41Z)
SafeAgent: Safeguarding LLM Agents via an Automated Risk Simulator [77.86600052899156]
Large Language Model (LLM)-based agents are increasingly deployed in real-world applications.<n>We propose AutoSafe, the first framework that systematically enhances agent safety through fully automated synthetic data generation.<n>We show that AutoSafe boosts safety scores by 45% on average and achieves a 28.91% improvement on real-world tasks.
arXiv Detail & Related papers (2025-05-23T10:56:06Z)
AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents [66.29263282311258]
We introduce a new benchmark AgentDAM that measures if AI web-navigation agents follow the privacy principle of data minimization''<n>Our benchmark simulates realistic web interaction scenarios end-to-end and is adaptable to all existing web navigation agents.
arXiv Detail & Related papers (2025-03-12T19:30:31Z)
A sketch of an AI control safety case [3.753791609999324]
As LLM agents gain a greater capacity to cause harm, AI developers might increasingly rely on control measures such as monitoring to justify that they are safe.<n>We sketch how developers could construct a "control safety case"<n>This safety case sketch is a step toward more concrete arguments that can be used to show that a dangerously capable LLM agent is safe to deploy.
arXiv Detail & Related papers (2025-01-28T21:52:15Z)
SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents [58.65256663334316]
We present SafeAgentBench -- the first benchmark for safety-aware task planning of embodied LLM agents in interactive simulation environments.<n>SafeAgentBench includes: (1) an executable, diverse, and high-quality dataset of 750 tasks, rigorously curated to cover 10 potential hazards and 3 task types; (2) SafeAgentEnv, a universal embodied environment with a low-level controller, supporting multi-agent execution with 17 high-level actions for 9 state-of-the-art baselines; and (3) reliable evaluation methods from both execution and semantic perspectives.
arXiv Detail & Related papers (2024-12-17T18:55:58Z)
AgentOps: Enabling Observability of LLM Agents [12.49728300301026]
Large language model (LLM) agents raise significant concerns on AI safety due to their autonomous and non-deterministic behavior.<n>We present a comprehensive taxonomy of AgentOps, identifying the artifacts and associated data that should be traced throughout the entire lifecycle of agents to achieve effective observability.<n>Our taxonomy serves as a reference template for developers to design and implement AgentOps infrastructure that supports monitoring, logging, and analytics.
arXiv Detail & Related papers (2024-11-08T02:31:03Z)
AGI Agent Safety by Iteratively Improving the Utility Function [0.0]
We present an AGI safety layer that creates a special dedicated input terminal to support the iterative improvement of an AGI agent's utility function. We show ongoing work in mapping it to a Causal Influence Diagram (CID) We then present the design of a learning agent, a design that wraps the safety layer around either a known machine learning system, or a potential future AGI-level learning system.
arXiv Detail & Related papers (2020-07-10T14:30:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.