Here Comes The AI Worm: Unleashing Zero-click Worms that Target GenAI-Powered Applications
- URL: http://arxiv.org/abs/2403.02817v1
- Date: Tue, 5 Mar 2024 09:37:13 GMT
- Title: Here Comes The AI Worm: Unleashing Zero-click Worms that Target GenAI-Powered Applications
- Authors: Stav Cohen, Ron Bitton, Ben Nassi,
- Abstract summary: Morris II is the first worm designed to target GenAI ecosystems through the use of adversarial self-replicating prompts.
We demonstrate the application of Morris II against GenAIpowered email assistants in two use cases.
- Score: 6.904930679944526
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In the past year, numerous companies have incorporated Generative AI (GenAI) capabilities into new and existing applications, forming interconnected Generative AI (GenAI) ecosystems consisting of semi/fully autonomous agents powered by GenAI services. While ongoing research highlighted risks associated with the GenAI layer of agents (e.g., dialog poisoning, membership inference, prompt leaking, jailbreaking), a critical question emerges: Can attackers develop malware to exploit the GenAI component of an agent and launch cyber-attacks on the entire GenAI ecosystem? This paper introduces Morris II, the first worm designed to target GenAI ecosystems through the use of adversarial self-replicating prompts. The study demonstrates that attackers can insert such prompts into inputs that, when processed by GenAI models, prompt the model to replicate the input as output (replication), engaging in malicious activities (payload). Additionally, these inputs compel the agent to deliver them (propagate) to new agents by exploiting the connectivity within the GenAI ecosystem. We demonstrate the application of Morris II against GenAIpowered email assistants in two use cases (spamming and exfiltrating personal data), under two settings (black-box and white-box accesses), using two types of input data (text and images). The worm is tested against three different GenAI models (Gemini Pro, ChatGPT 4.0, and LLaVA), and various factors (e.g., propagation rate, replication, malicious activity) influencing the performance of the worm are evaluated.
Related papers
- Unleashing Worms and Extracting Data: Escalating the Outcome of Attacks against RAG-based Inference in Scale and Severity Using Jailbreaking [6.904930679944526]
We show that with the ability to jailbreak a GenAI model, attackers can escalate the outcome of attacks against RAG-based applications.
In the first part of the paper, we show that attackers can escalate RAG membership inference attacks to RAG documents extraction attacks.
In the second part of the paper, we show that attackers can escalate the scale of RAG data poisoning attacks from compromising a single application to compromising the entire GenAI ecosystem.
arXiv Detail & Related papers (2024-09-12T13:50:22Z) - AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases [73.04652687616286]
We propose AgentPoison, the first backdoor attack targeting generic and RAG-based LLM agents by poisoning their long-term memory or RAG knowledge base.
Unlike conventional backdoor attacks, AgentPoison requires no additional model training or fine-tuning.
On each agent, AgentPoison achieves an average attack success rate higher than 80% with minimal impact on benign performance.
arXiv Detail & Related papers (2024-07-17T17:59:47Z) - BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models [57.5404308854535]
Safety backdoor attacks in large language models (LLMs) enable the stealthy triggering of unsafe behaviors while evading detection during normal interactions.
We present BEEAR, a mitigation approach leveraging the insight that backdoor triggers induce relatively uniform drifts in the model's embedding space.
Our bi-level optimization method identifies universal embedding perturbations that elicit unwanted behaviors and adjusts the model parameters to reinforce safe behaviors against these perturbations.
arXiv Detail & Related papers (2024-06-24T19:29:47Z) - NoiSec: Harnessing Noise for Security against Adversarial and Backdoor Attacks [24.583175914095783]
Malicious data manipulation attacks against machine learning jeopardize its reliability in safety-critical applications.
NoiSec is a reconstruction-based detector that disentangles the noise from the test input, extracts the underlying features from the noise, and leverages them to recognize systematic malicious manipulation.
NoiSec maintains a high detection performance, keeping the false positive rate within only 1%.
arXiv Detail & Related papers (2024-06-18T21:44:51Z) - Efficient Trigger Word Insertion [9.257916713112945]
Our main objective is to reduce the number of poisoned samples while still achieving a satisfactory Attack Success Rate (ASR) in text backdoor attacks.
We propose an efficient trigger word insertion strategy in terms of trigger word optimization and poisoned sample selection.
Our approach achieves an ASR of over 90% with only 10 poisoned samples in the dirty-label setting and requires merely 1.5% of the training data in the clean-label setting.
arXiv Detail & Related papers (2023-11-23T12:15:56Z) - Malicious Agent Detection for Robust Multi-Agent Collaborative Perception [52.261231738242266]
Multi-agent collaborative (MAC) perception is more vulnerable to adversarial attacks than single-agent perception.
We propose Malicious Agent Detection (MADE), a reactive defense specific to MAC perception.
We conduct comprehensive evaluations on a benchmark 3D dataset V2X-sim and a real-road dataset DAIR-V2X.
arXiv Detail & Related papers (2023-10-18T11:36:42Z) - Improved Activation Clipping for Universal Backdoor Mitigation and
Test-Time Detection [27.62279831135902]
Deep neural networks are vulnerable toTrojan attacks, where an attacker poisons the training set with backdoor triggers.
Recent work shows that backdoor poisoning induces over-fitting (abnormally large activations) in the attacked model.
We devise a new such approach, choosing the activation bounds to explicitly limit classification margins.
arXiv Detail & Related papers (2023-08-08T22:47:39Z) - IMBERT: Making BERT Immune to Insertion-based Backdoor Attacks [45.81957796169348]
Backdoor attacks are an insidious security threat against machine learning models.
We introduce IMBERT, which uses either gradients or self-attention scores derived from victim models to self-defend against backdoor attacks.
Our empirical studies demonstrate that IMBERT can effectively identify up to 98.5% of inserted triggers.
arXiv Detail & Related papers (2023-05-25T22:08:57Z) - Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning [95.60856995067083]
This work is among the first to perform adversarial defense for ASV without knowing the specific attack algorithms.
We propose to perform adversarial defense from two perspectives: 1) adversarial perturbation purification and 2) adversarial perturbation detection.
Experimental results show that our detection module effectively shields the ASV by detecting adversarial samples with an accuracy of around 80%.
arXiv Detail & Related papers (2021-06-01T07:10:54Z) - Combating Adversaries with Anti-Adversaries [118.70141983415445]
In particular, our layer generates an input perturbation in the opposite direction of the adversarial one.
We verify the effectiveness of our approach by combining our layer with both nominally and robustly trained models.
Our anti-adversary layer significantly enhances model robustness while coming at no cost on clean accuracy.
arXiv Detail & Related papers (2021-03-26T09:36:59Z) - Adversarial vs behavioural-based defensive AI with joint, continual and
active learning: automated evaluation of robustness to deception, poisoning
and concept drift [62.997667081978825]
Recent advancements in Artificial Intelligence (AI) have brought new capabilities to behavioural analysis (UEBA) for cyber-security.
In this paper, we present a solution to effectively mitigate this attack by improving the detection process and efficiently leveraging human expertise.
arXiv Detail & Related papers (2020-01-13T13:54:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.