Real AI Agents with Fake Memories: Fatal Context Manipulation Attacks on Web3 Agents
- URL: http://arxiv.org/abs/2503.16248v3
- Date: Wed, 09 Jul 2025 01:38:20 GMT
- Title: Real AI Agents with Fake Memories: Fatal Context Manipulation Attacks on Web3 Agents
- Authors: Atharv Singh Patlan, Peiyao Sheng, S. Ashwin Hebbar, Prateek Mittal, Pramod Viswanath,
- Abstract summary: This paper investigates the vulnerabilities of AI agents within blockchain-based financial ecosystems when exposed to adversarial threats in real-world scenarios.<n>We introduce the concept of context manipulation -- a comprehensive attack vector that exploits unprotected context surfaces.<n>Using ElizaOS, we showcase that malicious injections into prompts or historical records can trigger unauthorized asset transfers and protocol violations.
- Score: 36.49717045080722
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: AI agents integrated with Web3 offer autonomy and openness but raise security concerns as they interact with financial protocols and immutable smart contracts. This paper investigates the vulnerabilities of AI agents within blockchain-based financial ecosystems when exposed to adversarial threats in real-world scenarios. We introduce the concept of context manipulation -- a comprehensive attack vector that exploits unprotected context surfaces, including input channels, memory modules, and external data feeds. It expands on traditional prompt injection and reveals a more stealthy and persistent threat: memory injection. Using ElizaOS, a representative decentralized AI agent framework for automated Web3 operations, we showcase that malicious injections into prompts or historical records can trigger unauthorized asset transfers and protocol violations which could be financially devastating in reality. To quantify these risks, we introduce CrAIBench, a Web3-focused benchmark covering 150+ realistic blockchain tasks. such as token transfers, trading, bridges, and cross-chain interactions, and 500+ attack test cases using context manipulation. Our evaluation results confirm that AI models are significantly more vulnerable to memory injection compared to prompt injection. Finally, we evaluate a comprehensive defense roadmap, finding that prompt-injection defenses and detectors only provide limited protection when stored context is corrupted, whereas fine-tuning-based defenses substantially reduce attack success rates while preserving performance on single-step tasks. These results underscore the urgent need for AI agents that are both secure and fiduciarily responsible in blockchain environments.
Related papers
- Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition [101.86739402748995]
We run the largest public red-teaming competition to date, targeting 22 frontier AI agents across 44 realistic deployment scenarios.<n>We build the Agent Red Teaming benchmark and evaluate it across 19 state-of-the-art models.<n>Our findings highlight critical and persistent vulnerabilities in today's AI agents.
arXiv Detail & Related papers (2025-07-28T05:13:04Z) - Context manipulation attacks : Web agents are susceptible to corrupted memory [37.66661108936654]
"Plan injection" is a novel context manipulation attack that corrupts these agents' internal task representations by targeting this vulnerable context.<n>We show that plan injections bypass robust prompt injection defenses, achieving up to 3x higher attack success rates than comparable prompt-based attacks.<n>Our findings highlight that secure memory handling must be a first-class concern in agentic systems.
arXiv Detail & Related papers (2025-06-18T14:29:02Z) - The Hidden Dangers of Browsing AI Agents [0.0]
This paper presents a comprehensive security evaluation of such agents, focusing on systemic vulnerabilities across multiple architectural layers.<n>Our work outlines the first end-to-end threat model for browsing agents and provides actionable guidance for securing their deployment in real-world environments.
arXiv Detail & Related papers (2025-05-19T13:10:29Z) - WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks [36.97842000562324]
We introduce WASP -- a new benchmark for end-to-end evaluation of Web Agent Security against Prompt injection attacks.<n>We show that even top-tier AI models, including those with advanced reasoning capabilities, can be deceived by simple, low-effort human-written injections.<n>Our end-to-end evaluation reveals a previously unobserved insight: while attacks partially succeed in up to 86% of the case, even state-of-the-art agents often struggle to fully complete the attacker goals.
arXiv Detail & Related papers (2025-04-22T17:51:03Z) - A Framework for Evaluating Emerging Cyberattack Capabilities of AI [11.595840449117052]
This work introduces a novel evaluation framework that addresses limitations by: (1) examining the end-to-end attack chain, (2) identifying gaps in AI threat evaluation, and (3) helping defenders prioritize targeted mitigations.
We analyzed over 12,000 real-world instances of AI involvement in cyber incidents, catalogued by Google's Threat Intelligence Group, to curate seven representative attack chain archetypes.
We report on AI's potential to amplify offensive capabilities across specific attack stages, and offer recommendations for prioritizing defenses.
arXiv Detail & Related papers (2025-03-14T23:05:02Z) - OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities [0.0]
We demonstrate a new approach to assessing AI's progress towards enabling and scaling real-world offensive cyber operations.
We detail OCCULT, a lightweight operational evaluation framework that allows cyber security experts to contribute to rigorous and repeatable measurement.
We find that there has been significant recent advancement in the risks of AI being used to scale realistic cyber threats.
arXiv Detail & Related papers (2025-02-18T19:33:14Z) - MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents [60.30753230776882]
LLM agents are vulnerable to indirect prompt injection (IPI) attacks, where malicious tasks embedded in tool-retrieved information can redirect the agent to take unauthorized actions.<n>We present MELON, a novel IPI defense that detects attacks by re-executing the agent's trajectory with a masked user prompt modified through a masking function.
arXiv Detail & Related papers (2025-02-07T18:57:49Z) - Position: A taxonomy for reporting and describing AI security incidents [57.98317583163334]
We argue that specific are required to describe and report security incidents of AI systems.<n>Existing frameworks for either non-AI security or generic AI safety incident reporting are insufficient to capture the specific properties of AI security.
arXiv Detail & Related papers (2024-12-19T13:50:26Z) - Poison Attacks and Adversarial Prompts Against an Informed University Virtual Assistant [3.0874677990361246]
Large language models (LLMs) are particularly vulnerable to adversarial attacks.<n>The rapid development pace of AI-based systems is being driven by the potential of Generative AI (GenAI) to assist humans in decision making.<n>A threat actor can use security gaps, poor safeguards, and limited data governance to carry out attacks that grant unauthorized access to the system and its data.
arXiv Detail & Related papers (2024-11-03T05:34:38Z) - Countering Autonomous Cyber Threats [40.00865970939829]
Foundation Models present dual-use concerns broadly and within the cyber domain specifically.
Recent research has shown the potential for these advanced models to inform or independently execute offensive cyberspace operations.
This work evaluates several state-of-the-art FMs on their ability to compromise machines in an isolated network and investigates defensive mechanisms to defeat such AI-powered attacks.
arXiv Detail & Related papers (2024-10-23T22:46:44Z) - HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions [76.42274173122328]
We present HAICOSYSTEM, a framework examining AI agent safety within diverse and complex social interactions.
We run 1840 simulations based on 92 scenarios across seven domains (e.g., healthcare, finance, education)
Our experiments show that state-of-the-art LLMs, both proprietary and open-sourced, exhibit safety risks in over 50% cases.
arXiv Detail & Related papers (2024-09-24T19:47:21Z) - Interoperability and Explicable AI-based Zero-Day Attacks Detection Process in Smart Community [0.0]
This paper aims to explain how future technologies such as 6G mobile communication, Internet of Everything (IoE), Artificial Intelligence (AI), and Smart Contract embedded WPA3 protocol-based WiFi-8 can work together to prevent known attack vectors and provide protection against zero-day attacks.
arXiv Detail & Related papers (2024-08-06T03:11:36Z) - Dissecting Adversarial Robustness of Multimodal LM Agents [70.2077308846307]
We manually create 200 targeted adversarial tasks and evaluation scripts in a realistic threat model on top of VisualWebArena.<n>We find that we can successfully break latest agents that use black-box frontier LMs, including those that perform reflection and tree search.<n>We also use ARE to rigorously evaluate how the robustness changes as new components are added.
arXiv Detail & Related papers (2024-06-18T17:32:48Z) - AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways [10.16690494897609]
An Artificial Intelligence (AI) agent is a software entity that autonomously performs tasks or makes decisions based on pre-defined objectives and data inputs.
This survey delves into the emerging security threats faced by AI agents, categorizing them into four critical knowledge gaps.
By systematically reviewing these threats, this paper highlights both the progress made and the existing limitations in safeguarding AI agents.
arXiv Detail & Related papers (2024-06-04T01:22:31Z) - Artificial Intelligence as the New Hacker: Developing Agents for Offensive Security [0.0]
This paper explores the integration of Artificial Intelligence (AI) into offensive cybersecurity.
It develops an autonomous AI agent, ReaperAI, designed to simulate and execute cyberattacks.
ReaperAI demonstrates the potential to identify, exploit, and analyze security vulnerabilities autonomously.
arXiv Detail & Related papers (2024-05-09T18:15:12Z) - Designing an attack-defense game: how to increase robustness of
financial transaction models via a competition [69.08339915577206]
Given the escalating risks of malicious attacks in the finance sector, understanding adversarial strategies and robust defense mechanisms for machine learning models is critical.
We aim to investigate the current state and dynamics of adversarial attacks and defenses for neural network models that use sequential financial data as the input.
We have designed a competition that allows realistic and detailed investigation of problems in modern financial transaction data.
The participants compete directly against each other, so possible attacks and defenses are examined in close-to-real-life conditions.
arXiv Detail & Related papers (2023-08-22T12:53:09Z) - Automating Privilege Escalation with Deep Reinforcement Learning [71.87228372303453]
In this work, we exemplify the potential threat of malicious actors using deep reinforcement learning to train automated agents.
We present an agent that uses a state-of-the-art reinforcement learning algorithm to perform local privilege escalation.
Our agent is usable for generating realistic attack sensor data for training and evaluating intrusion detection systems.
arXiv Detail & Related papers (2021-10-04T12:20:46Z) - Certifiers Make Neural Networks Vulnerable to Availability Attacks [70.69104148250614]
We show for the first time that fallback strategies can be deliberately triggered by an adversary.
In addition to naturally occurring abstains for some inputs and perturbations, the adversary can use training-time attacks to deliberately trigger the fallback.
We design two novel availability attacks, which show the practical relevance of these threats.
arXiv Detail & Related papers (2021-08-25T15:49:10Z) - The Feasibility and Inevitability of Stealth Attacks [63.14766152741211]
We study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence systems.
In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself.
arXiv Detail & Related papers (2021-06-26T10:50:07Z) - Adversarial vs behavioural-based defensive AI with joint, continual and
active learning: automated evaluation of robustness to deception, poisoning
and concept drift [62.997667081978825]
Recent advancements in Artificial Intelligence (AI) have brought new capabilities to behavioural analysis (UEBA) for cyber-security.
In this paper, we present a solution to effectively mitigate this attack by improving the detection process and efficiently leveraging human expertise.
arXiv Detail & Related papers (2020-01-13T13:54:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.