Enforcing Temporal Constraints for LLM Agents
- URL: http://arxiv.org/abs/2512.23738v1
- Date: Thu, 25 Dec 2025 06:12:13 GMT
- Title: Enforcing Temporal Constraints for LLM Agents
- Authors: Adharsh Kamath, Sishen Zhang, Calvin Xu, Shubham Ugare, Gagandeep Singh, Sasa Misailovic,
- Abstract summary: Existing guardrails rely on imprecise natural language instructions or post-hoc monitoring.<n>We present Agent-C, a novel framework that provides run-time guarantees ensuring LLM agents adhere to formal temporal safety properties.<n>We evaluate Agent-C across two real-world applications: retail customer service and airline ticket reservation system.
- Score: 10.694240979134326
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: LLM-based agents are deployed in safety-critical applications, yet current guardrail systems fail to prevent violations of temporal safety policies, requirements that govern the ordering and sequencing of agent actions. For instance, agents may access sensitive data before authenticating users or process refunds to unauthorized payment methods, violations that require reasoning about sequences of action rather than an individual action. Existing guardrails rely on imprecise natural language instructions or post-hoc monitoring, and provide no formal guarantees that agents will satisfy temporal constraints. We present Agent-C, a novel framework that provides run-time guarantees ensuring LLM agents adhere to formal temporal safety properties. Agent-C introduces a domain-specific language for expressing temporal properties (e.g., authenticate before accessing data), translates specifications to first-order logic, and uses SMT solving to detect non-compliant agent actions during token generation. When the LLM attempts to generate a non-compliant tool call, Agent-C leverages constrained generation techniques to ensure that every action generated by the LLM complies with the specification, and to generate a compliant alternative to a non-compliant agent action. We evaluate Agent-C across two real-world applications: retail customer service and airline ticket reservation system, and multiple language models (open and closed-source). Our results demonstrate that Agent-C achieves perfect safety (100% conformance, 0% harm), while improving task utility compared to state-of-the-art guardrails and unrestricted agents. On SoTA closed-source models, Agent-C improves conformance (77.4% to 100% for Claude Sonnet 4.5 and 83.7% to 100% for GPT-5), while simultaneously increasing utility (71.8% to 75.2% and 66.1% to 70.6%, respectively), representing a new SoTA frontier for reliable agentic reasoning.
Related papers
- ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback [53.2744585868162]
Monitoring step-level tool invocation behaviors in real time is critical for agent deployment.<n>We first construct TS-Bench, a novel benchmark for step-level tool invocation safety detection in LLM agents.<n>We then develop a guardrail model, TS-Guard, using multi-task reinforcement learning.
arXiv Detail & Related papers (2026-01-15T07:54:32Z) - Towards Verifiably Safe Tool Use for LLM Agents [53.55621104327779]
Large language model (LLM)-based AI agents extend capabilities by enabling access to tools such as data sources, APIs, search engines, code sandboxes, and even other agents.<n>LLMs may invoke unintended tool interactions and introduce risks, such as leaking sensitive data or overwriting critical records.<n>Current approaches to mitigate these risks, such as model-based safeguards, enhance agents' reliability but cannot guarantee system safety.
arXiv Detail & Related papers (2026-01-12T21:31:38Z) - Are Your Agents Upward Deceivers? [73.1073084327614]
Large Language Model (LLM)-based agents are increasingly used as autonomous subordinates that carry out tasks for users.<n>This raises the question of whether they may also engage in deception, similar to how individuals in human organizations lie to superiors to create a good image or avoid punishment.<n>We observe and define agentic upward deception, a phenomenon in which an agent facing environmental constraints conceals its failure and performs actions that were not requested without reporting.
arXiv Detail & Related papers (2025-12-04T14:47:05Z) - VeriGuard: Enhancing LLM Agent Safety via Verified Code Generation [40.594947933580464]
The deployment of autonomous AI agents in sensitive domains, such as healthcare, introduces critical risks to safety, security, and privacy.<n>We introduce VeriGuard, a novel framework that provides formal safety guarantees for LLM-based agents.
arXiv Detail & Related papers (2025-10-03T04:11:43Z) - PSG-Agent: Personality-Aware Safety Guardrail for LLM-based Agents [60.23552141928126]
PSG-Agent is a personalized and dynamic system for LLM-based agents.<n>First, PSG-Agent creates personalized guardrails by mining the interaction history for stable traits.<n>Second, PSG-Agent implements continuous monitoring across the agent pipeline with specialized guards.
arXiv Detail & Related papers (2025-09-28T03:31:59Z) - ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning [7.481324060587101]
ShieldAgent is a guardrail agent designed to enforce explicit safety policy compliance for the action trajectory of other protected agents.<n>Given the action trajectory of the protected agent, ShieldAgent retrieves relevant rule circuits and generates a shielding plan.<n>ShieldAgent reduces API queries by 64.7% and inference time by 58.2%, demonstrating its high precision and efficiency in safeguarding agents.
arXiv Detail & Related papers (2025-03-26T17:58:40Z) - AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents [8.290987399121343]
We propose AgentSpec, a lightweight language for specifying and enforcing runtime constraints on LLM agents.<n>With AgentSpec, users define structured rules that incorporate triggers, predicates, and enforcement mechanisms.<n>We implement AgentSpec across multiple domains, including code execution, embodied agents, and autonomous driving.
arXiv Detail & Related papers (2025-03-24T13:31:48Z) - AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents [84.96249955105777]
LLM agents may pose a greater risk if misused, but their robustness remains underexplored.<n>We propose a new benchmark called AgentHarm to facilitate research on LLM agent misuse.<n>We find leading LLMs are surprisingly compliant with malicious agent requests without jailbreaking.
arXiv Detail & Related papers (2024-10-11T17:39:22Z) - GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning [79.07152553060601]
We propose GuardAgent, the first guardrail agent to protect target agents by dynamically checking whether their actions satisfy given safety guard requests.<n>Specifically, GuardAgent first analyzes the safety guard requests to generate a task plan, and then maps this plan into guardrail code for execution.<n>We show that GuardAgent effectively moderates the violation actions for different types of agents on two benchmarks with over 98% and 83% guardrail accuracies, respectively.
arXiv Detail & Related papers (2024-06-13T14:49:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.