Related papers: Context is Key for Agent Security

Related papers

OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety [58.201189860217724]
We introduce OpenAgentSafety, a comprehensive framework for evaluating agent behavior across eight critical risk categories.<n>Unlike prior work, our framework evaluates agents that interact with real tools, including web browsers, code execution environments, file systems, bash shells, and messaging platforms.<n>It combines rule-based analysis with LLM-as-judge assessments to detect both overt and subtle unsafe behaviors.
arXiv Detail & Related papers (2025-07-08T16:18:54Z)
Kaleidoscopic Teaming in Multi Agent Simulations [75.47388708240042]
We argue that existing red teaming or safety evaluation frameworks fall short in evaluating safety risks in complex behaviors, thought processes and actions taken by agents.<n>We introduce new in-context optimization techniques that can be used in our kaleidoscopic teaming framework to generate better scenarios for safety analysis.<n>We present appropriate metrics that can be used along with our framework to measure safety of agents.
arXiv Detail & Related papers (2025-06-20T23:37:17Z)
Effective Red-Teaming of Policy-Adherent Agents [7.080204863156575]
Task-oriented LLM-based agents are increasingly used in domains with strict policies, such as refund eligibility or cancellation rules.<n>We propose a novel threat model that focuses on adversarial users aiming to exploit policy-adherent agents for personal benefit.<n>We present CRAFT, a multi-agent red-teaming system that leverages policy-aware persuasive strategies to undermine a policy-adherent agent in a customer-service scenario.
arXiv Detail & Related papers (2025-06-11T10:59:47Z)
LLM Agents Should Employ Security Principles [60.03651084139836]
This paper argues that the well-established design principles in information security should be employed when deploying Large Language Model (LLM) agents at scale.<n>We introduce AgentSandbox, a conceptual framework embedding these security principles to provide safeguards throughout an agent's life-cycle.
arXiv Detail & Related papers (2025-05-29T21:39:08Z)
Progent: Programmable Privilege Control for LLM Agents [46.49787947705293]
We introduce Progent, the first privilege control mechanism for LLM agents. At its core is a domain-specific language for flexibly expressing privilege control policies applied during agent execution. This enables agent developers and users to craft suitable policies for their specific use cases and enforce them deterministically to guarantee security.
arXiv Detail & Related papers (2025-04-16T01:58:40Z)
Multi-Agent Systems Execute Arbitrary Malicious Code [9.200635465485067]
We show that adversarial content can hijack control and communication within the system to invoke unsafe agents and functionalities. We show that control-flow hijacking attacks succeed even if the individual agents are not susceptible to direct or indirect prompt injection.
arXiv Detail & Related papers (2025-03-15T16:16:08Z)
Firewalls to Secure Dynamic LLM Agentic Networks [36.6600856429565]
We propose a practical design for constrained LLM agentic networks that balance adaptability, security, and privacy. Our framework automatically constructs and updates task-specific rules from prior simulations to build firewalls.
arXiv Detail & Related papers (2025-02-03T21:00:14Z)
SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World [50.937342998351426]
Chain-of-User-Thought (COUT) is a novel embodied reasoning paradigm. We introduce SmartAgent, an agent framework perceiving cyber environments and reasoning personalized requirements. Our work is the first to formulate the COUT process, serving as a preliminary attempt towards embodied personalized agent learning.
arXiv Detail & Related papers (2024-12-10T12:40:35Z)
Securing Legacy Communication Networks via Authenticated Cyclic Redundancy Integrity Check [98.34702864029796]
We propose Authenticated Cyclic Redundancy Integrity Check (ACRIC) ACRIC preserves backward compatibility without requiring additional hardware and is protocol agnostic. We show that ACRIC offers robust security with minimal transmission overhead ( 1 ms)
arXiv Detail & Related papers (2024-11-21T18:26:05Z)
Compromising Embodied Agents with Contextual Backdoor Attacks [69.71630408822767]
Large language models (LLMs) have transformed the development of embodied intelligence. This paper uncovers a significant backdoor security threat within this process. By poisoning just a few contextual demonstrations, attackers can covertly compromise the contextual environment of a black-box LLM.
arXiv Detail & Related papers (2024-08-06T01:20:12Z)
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents [27.701301913159067]
We introduce AgentDojo, an evaluation framework for agents that execute tools over untrusted data. AgentDojo is not a static test suite, but rather an environment for designing and evaluating new agent tasks, defenses, and adaptive attacks. We populate AgentDojo with 97 realistic tasks, 629 security test cases, and various attack and defense paradigms from the literature.
arXiv Detail & Related papers (2024-06-19T08:55:56Z)
Dissecting Adversarial Robustness of Multimodal LM Agents [70.2077308846307]
We manually create 200 targeted adversarial tasks and evaluation scripts in a realistic threat model on top of VisualWebArena. We find that we can successfully break latest agents that use black-box frontier LMs, including those that perform reflection and tree search. We also use ARE to rigorously evaluate how the robustness changes as new components are added.
arXiv Detail & Related papers (2024-06-18T17:32:48Z)
A Formal Model of Security Controls' Capabilities and Its Applications to Policy Refinement and Incident Management [0.2621730497733947]
This paper presents the Security Capability Model (SCM), a formal model that abstracts the features that security controls offer for enforcing security policies.<n>By validating its effectiveness in real-world scenarios, we show that SCM enables the automation of different and complex security tasks.
arXiv Detail & Related papers (2024-05-06T15:06:56Z)
ASSERT: Automated Safety Scenario Red Teaming for Evaluating the Robustness of Large Language Models [65.79770974145983]
ASSERT, Automated Safety Scenario Red Teaming, consists of three methods -- semantically aligned augmentation, target bootstrapping, and adversarial knowledge injection. We partition our prompts into four safety domains for a fine-grained analysis of how the domain affects model performance. We find statistically significant performance differences of up to 11% in absolute classification accuracy among semantically related scenarios and error rates of up to 19% absolute error in zero-shot adversarial settings.
arXiv Detail & Related papers (2023-10-14T17:10:28Z)
A Model Based Framework for Testing Safety and Security in Operational Technology Environments [0.46040036610482665]
We propose a model-based testing approach which we consider a promising way to analyze the safety and security behavior of a system under test. The structure of the underlying framework is divided into four parts, according to the critical factors in testing of operational technology environments.
arXiv Detail & Related papers (2023-06-22T05:37:09Z)
Sustainable Adaptive Security [11.574868434725117]
We propose the notion of Sustainable Adaptive Security (SAS) which reflects enduring protection by augmenting adaptive security systems with the capability of mitigating newly discovered threats. We use a smart home example to showcase how we can engineer the activities of the MAPE (Monitor, Analysis, Planning, and Execution) loop of systems satisfying sustainable adaptive security.
arXiv Detail & Related papers (2023-06-05T08:48:36Z)
REGARD: Rules of EngaGement for Automated cybeR Defense to aid in Intrusion Response [0.41998444721319206]
Automated Intelligent Cyberdefense Agents (AICAs) are part Intrusion Detection Systems (IDS) and part Intrusion Response Systems (IRS) We create Rules of EngaGement for Automated cybeR Defense (REGARD) system which holds a set of Rules of Engagement (RoE) to protect the managed system according to the instructions provided by the human operator.
arXiv Detail & Related papers (2023-05-23T11:52:02Z)
Using In-Context Learning to Improve Dialogue Safety [45.303005593685036]
We investigate a retrieval-based method for reducing bias and toxicity in responses from chatbots. It uses in-context learning to steer a model towards safer generations. We find our method performs competitively with strong baselines without requiring training.
arXiv Detail & Related papers (2023-02-02T04:46:03Z)
Foveate, Attribute, and Rationalize: Towards Physically Safe and Trustworthy AI [76.28956947107372]
Covertly unsafe text is an area of particular interest, as such text may arise from everyday scenarios and are challenging to detect as harmful. We propose FARM, a novel framework leveraging external knowledge for trustworthy rationale generation in the context of safety. Our experiments show that FARM obtains state-of-the-art results on the SafeText dataset, showing absolute improvement in safety classification accuracy by 5.9%.
arXiv Detail & Related papers (2022-12-19T17:51:47Z)
Certifiably Robust Policy Learning against Adversarial Communication in Multi-agent Systems [51.6210785955659]
Communication is important in many multi-agent reinforcement learning (MARL) problems for agents to share information and make good decisions. However, when deploying trained communicative agents in a real-world application where noise and potential attackers exist, the safety of communication-based policies becomes a severe issue that is underexplored. In this work, we consider an environment with $N$ agents, where the attacker may arbitrarily change the communication from any $CfracN-12$ agents to a victim agent.
arXiv Detail & Related papers (2022-06-21T07:32:18Z)
Learning with Weak Supervision for Email Intent Detection [56.71599262462638]
We propose to leverage user actions as a source of weak supervision to detect intents in emails. We develop an end-to-end robust deep neural network model for email intent identification.
arXiv Detail & Related papers (2020-05-26T23:41:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.