Related papers: The AI Agent Code of Conduct: Automated Guardrail Policy-as-Prompt Synthesis

Related papers

LPS-Bench: Benchmarking Safety Awareness of Computer-Use Agents in Long-Horizon Planning under Benign and Adversarial Scenarios [51.52395368061729]
We present LPS-Bench, a benchmark that evaluates the planning-time safety awareness of MCP-based CUAs under long-horizon tasks.<n> Experiments reveal substantial deficiencies in existing CUAs' ability to maintain safe behavior.<n>We propose mitigation strategies to improve long-horizon planning safety in MCP-based CUA systems.
arXiv Detail & Related papers (2026-02-03T08:40:24Z)
POLARIS: Typed Planning and Governed Execution for Agentic AI in Back-Office Automation [0.28055179094637683]
POLARIS is a governed orchestration framework that treats automation as typed plan synthesis and validated execution over LLM agents.<n> Empirically, POLARIS achieves a micro F1 of 0.81 on the SROIE dataset and, on a controlled synthetic suite, achieves 0.95 to 1.00 precision for anomaly routing with preserved audit trails.
arXiv Detail & Related papers (2026-01-16T22:38:21Z)
AgentGuardian: Learning Access Control Policies to Govern AI Agent Behavior [20.817336331051752]
AgentGuardian governs and protects AI agent operations by enforcing context-aware access-control policies.<n>It effectively detects malicious or misleading inputs while preserving normal agent functionality.
arXiv Detail & Related papers (2026-01-15T14:33:36Z)
CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents [60.98294016925157]
AI agents are vulnerable to prompt injection attacks, where malicious content hijacks agent behavior to steal credentials or cause financial loss.<n>We introduce Single-Shot Planning for CUAs, where a trusted planner generates a complete execution graph with conditional branches before any observation of potentially malicious content.<n>Although this architectural isolation successfully prevents instruction injections, we show that additional measures are needed to prevent Branch Steering attacks.
arXiv Detail & Related papers (2026-01-14T23:06:35Z)
Towards Verifiably Safe Tool Use for LLM Agents [53.55621104327779]
Large language model (LLM)-based AI agents extend capabilities by enabling access to tools such as data sources, APIs, search engines, code sandboxes, and even other agents.<n>LLMs may invoke unintended tool interactions and introduce risks, such as leaking sensitive data or overwriting critical records.<n>Current approaches to mitigate these risks, such as model-based safeguards, enhance agents' reliability but cannot guarantee system safety.
arXiv Detail & Related papers (2026-01-12T21:31:38Z)
Executable Governance for AI: Translating Policies into Rules Using LLMs [1.388831902854619]
Policy-to-Tests (P2T) is a framework that converts natural policy documents into normalized, machine-readable rules.<n>To test the framework beyond a single policy, we apply it across general frameworks, sector guidance, and enterprise standards.<n>These AI-generated rules closely match strong human baselines on span-level and rule-level metrics, with robust inter-annotator agreement on the gold set.
arXiv Detail & Related papers (2025-12-04T03:11:54Z)
Policy Cards: Machine-Readable Runtime Governance for Autonomous AI Agents [0.0]
Policy Cards are a machine-readable, deployment-layer standard for expressing operational, regulatory, and ethical constraints for AI agents.<n>Each Policy Card can be validated automatically, version-controlled, and linked to runtime enforcement or continuous-audit pipelines.
arXiv Detail & Related papers (2025-10-28T12:59:55Z)
Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols [80.68060125494645]
We study adaptive attacks by an untrusted model that knows the protocol and the monitor model.<n>We instantiate a simple adaptive attack vector by which the attacker embeds publicly known or zero-shot prompt injections in the model outputs.
arXiv Detail & Related papers (2025-10-10T15:12:44Z)
Safe and Certifiable AI Systems: Concepts, Challenges, and Lessons Learned [45.44933002008943]
This white paper presents the T"UV AUSTRIA Trusted AI framework.<n>It is an end-to-end audit catalog and methodology for assessing and certifying machine learning systems.<n>Building on three pillars - Secure Software Development, Functional Requirements, and Ethics & Data Privacy - it translates the high-level obligations of the EU AI Act into specific, testable criteria.
arXiv Detail & Related papers (2025-09-08T17:52:08Z)
Rethinking Testing for LLM Applications: Characteristics, Challenges, and a Lightweight Interaction Protocol [83.83217247686402]
Large Language Models (LLMs) have evolved from simple text generators into complex software systems that integrate retrieval augmentation, tool invocation, and multi-turn interactions.<n>Their inherent non-determinism, dynamism, and context dependence pose fundamental challenges for quality assurance.<n>This paper decomposes LLM applications into a three-layer architecture: textbftextitSystem Shell Layer, textbftextitPrompt Orchestration Layer, and textbftextitLLM Inference Core.
arXiv Detail & Related papers (2025-08-28T13:00:28Z)
Rethinking Autonomy: Preventing Failures in AI-Driven Software Engineering [1.6766200616088744]
SAFE-AI Framework is a holistic approach emphasizing Safety, Auditability, Feedback, and Explainability.<n>We introduce a novel taxonomy of AI behaviors categorizing suggestive, generative, autonomous, and destructive actions to guide risk assessment and oversight.<n>This paper provides a roadmap for responsible AI integration in software engineering, aligning with emerging regulations like the EU AI Act and Canada's AIDA.
arXiv Detail & Related papers (2025-08-15T22:13:54Z)
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective [65.12150411762273]
We show that pruning random demonstrations into seemingly incoherent "gibberish" can remarkably improve performance across diverse tasks.<n>We propose a self-discover prompt optimization framework, PromptQuine, that automatically searches for the pruning strategy by itself using only low-data regimes.
arXiv Detail & Related papers (2025-06-22T07:53:07Z)
DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents [52.92354372596197]
Large Language Models (LLMs) are increasingly central to agentic systems due to their strong reasoning and planning capabilities.<n>This interaction also introduces the risk of prompt injection attacks, where malicious inputs from external sources can mislead the agent's behavior.<n>We propose a Dynamic Rule-based Isolation Framework for Trustworthy agentic systems, which enforces both control and data-level constraints.
arXiv Detail & Related papers (2025-06-13T05:01:09Z)
On Automating Security Policies with Contemporary LLMs [3.47402794691087]
In this paper, we present a framework for automating attack mitigation policy compliance through an innovative combination of in-context learning and retrieval-augmented generation (RAG)<n>Our empirical evaluation, conducted using publicly available CTI policies in STIXv2 format and Windows API documentation, demonstrates significant improvements in precision, recall, and F1-score when employing RAG compared to a non-RAG baseline.
arXiv Detail & Related papers (2025-06-05T09:58:00Z)
LLM Agents Should Employ Security Principles [60.03651084139836]
This paper argues that the well-established design principles in information security should be employed when deploying Large Language Model (LLM) agents at scale.<n>We introduce AgentSandbox, a conceptual framework embedding these security principles to provide safeguards throughout an agent's life-cycle.
arXiv Detail & Related papers (2025-05-29T21:39:08Z)
SOPBench: Evaluating Language Agents at Following Standard Operating Procedures and Constraints [59.645885492637845]
SOPBench is an evaluation pipeline that transforms each service-specific SOP code program into a directed graph of executable functions.<n>Our approach transforms each service-specific SOP code program into a directed graph of executable functions and requires agents to call these functions based on natural language SOP descriptions.<n>We evaluate 18 leading models, and results show the task is challenging even for top-tier models.
arXiv Detail & Related papers (2025-03-11T17:53:02Z)
Automating Prompt Leakage Attacks on Large Language Models Using Agentic Approach [9.483655213280738]
This paper presents a novel approach to evaluating the security of large language models (LLMs)<n>We define prompt leakage as a critical threat to secure LLM deployment.<n>We implement a multi-agent system where cooperative agents are tasked with probing and exploiting the target LLM to elicit its prompt.
arXiv Detail & Related papers (2025-02-18T08:17:32Z)
IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems [2.2810745411557316]
We introduce IntellAgent, a scalable, open-source framework to evaluate conversational AI systems.<n>IntellAgent automates the creation of synthetic benchmarks by combining policy-driven graph modeling, realistic event generation, and interactive user-agent simulations.<n>Our findings demonstrate that IntellAgent serves as an effective framework for advancing conversational AI by addressing challenges in bridging research and deployment.
arXiv Detail & Related papers (2025-01-19T14:58:35Z)
AutoGuide: Automated Generation and Selection of Context-Aware Guidelines for Large Language Model Agents [74.17623527375241]
We introduce a novel framework, called AutoGuide, which automatically generates context-aware guidelines from offline experiences.<n>As a result, our guidelines facilitate the provision of relevant knowledge for the agent's current decision-making process.<n>Our evaluation demonstrates that AutoGuide significantly outperforms competitive baselines in complex benchmark domains.
arXiv Detail & Related papers (2024-03-13T22:06:03Z)
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning [50.47568731994238]
Key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL) This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies.
arXiv Detail & Related papers (2023-12-22T17:57:57Z)
Monitoring ROS2: from Requirements to Autonomous Robots [58.720142291102135]
This paper provides an overview of a formal approach to generating runtime monitors for autonomous robots from requirements written in a structured natural language. Our approach integrates the Formal Requirement Elicitation Tool (FRET) with Copilot, a runtime verification framework, through the Ogma integration tool.
arXiv Detail & Related papers (2022-09-28T12:19:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.