A Jailbroken GenAI Model Can Cause Substantial Harm: GenAI-powered Applications are Vulnerable to PromptWares
- URL: http://arxiv.org/abs/2408.05061v1
- Date: Fri, 9 Aug 2024 13:32:50 GMT
- Title: A Jailbroken GenAI Model Can Cause Substantial Harm: GenAI-powered Applications are Vulnerable to PromptWares
- Authors: Stav Cohen, Ron Bitton, Ben Nassi,
- Abstract summary: We show that a jailbroken GenAI model can cause substantial harm to GenAI-powered applications.
We present PromptWare, a new type of attack that flips the GenAI model's behavior from serving an application to attacking it.
- Score: 6.904930679944526
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper we argue that a jailbroken GenAI model can cause substantial harm to GenAI-powered applications and facilitate PromptWare, a new type of attack that flips the GenAI model's behavior from serving an application to attacking it. PromptWare exploits user inputs to jailbreak a GenAI model to force/perform malicious activity within the context of a GenAI-powered application. First, we introduce a naive implementation of PromptWare that behaves as malware that targets Plan & Execute architectures (a.k.a., ReAct, function calling). We show that attackers could force a desired execution flow by creating a user input that produces desired outputs given that the logic of the GenAI-powered application is known to attackers. We demonstrate the application of a DoS attack that triggers the execution of a GenAI-powered assistant to enter an infinite loop that wastes money and computational resources on redundant API calls to a GenAI engine, preventing the application from providing service to a user. Next, we introduce a more sophisticated implementation of PromptWare that we name Advanced PromptWare Threat (APwT) that targets GenAI-powered applications whose logic is unknown to attackers. We show that attackers could create user input that exploits the GenAI engine's advanced AI capabilities to launch a kill chain in inference time consisting of six steps intended to escalate privileges, analyze the application's context, identify valuable assets, reason possible malicious activities, decide on one of them, and execute it. We demonstrate the application of APwT against a GenAI-powered e-commerce chatbot and show that it can trigger the modification of SQL tables, potentially leading to unauthorized discounts on the items sold to the user.
Related papers
- Ethics of Software Programming with Generative AI: Is Programming without Generative AI always radical? [0.32985979395737786]
The paper acknowledges the transformative power of GenAI in software code generation.
It posits that GenAI is not a replacement but a complementary tool for writing software code.
Ethical considerations are paramount with the paper advocating for stringent ethical guidelines.
arXiv Detail & Related papers (2024-08-20T05:35:39Z) - Here Comes The AI Worm: Unleashing Zero-click Worms that Target GenAI-Powered Applications [6.904930679944526]
Morris II is the first worm designed to target GenAI ecosystems through the use of adversarial self-replicating prompts.
We demonstrate the application of Morris II against GenAIpowered email assistants in two use cases.
arXiv Detail & Related papers (2024-03-05T09:37:13Z) - Prompt Smells: An Omen for Undesirable Generative AI Outputs [4.105236597768038]
We propose two new concepts that will aid the research community in addressing limitations associated with the application of GenAI models.
First, we propose a definition for the "desirability" of GenAI outputs and three factors which are observed to influence it.
Second, drawing inspiration from Martin Fowler's code smells, we propose the concept of "prompt smells" and the adverse effects they are observed to have on the desirability of GenAI outputs.
arXiv Detail & Related papers (2024-01-23T10:10:01Z) - AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models [54.95912006700379]
We introduce AutoDAN, a novel jailbreak attack against aligned Large Language Models.
AutoDAN can automatically generate stealthy jailbreak prompts by the carefully designed hierarchical genetic algorithm.
arXiv Detail & Related papers (2023-10-03T19:44:37Z) - GenAI Against Humanity: Nefarious Applications of Generative Artificial
Intelligence and Large Language Models [11.323961700172175]
This article serves as a synthesis of rigorous research presented on the risks of GenAI and misuse of LLMs.
We'll uncover the societal implications that ripple through the GenAI revolution we are witnessing.
The lines between the virtual and the real worlds are blurring, and the consequences of potential GenAI's nefarious applications impact us all.
arXiv Detail & Related papers (2023-10-01T17:25:56Z) - Identifying and Mitigating the Security Risks of Generative AI [179.2384121957896]
This paper reports the findings of a workshop held at Google on the dual-use dilemma posed by GenAI.
GenAI can be used just as well by attackers to generate new attacks and increase the velocity and efficacy of existing attacks.
We discuss short-term and long-term goals for the community on this topic.
arXiv Detail & Related papers (2023-08-28T18:51:09Z) - Seamful XAI: Operationalizing Seamful Design in Explainable AI [59.89011292395202]
Mistakes in AI systems are inevitable, arising from both technical limitations and sociotechnical gaps.
We propose that seamful design can foster AI explainability by revealing sociotechnical and infrastructural mismatches.
We explore this process with 43 AI practitioners and real end-users.
arXiv Detail & Related papers (2022-11-12T21:54:05Z) - Investigating Explainability of Generative AI for Code through
Scenario-based Design [44.44517254181818]
generative AI (GenAI) technologies are maturing and being applied to application domains such as software engineering.
We conduct 9 workshops with 43 software engineers in which real examples from state-of-the-art generative AI models were used to elicit users' explainability needs.
Our work explores explainability needs for GenAI for code and demonstrates how human-centered approaches can drive the technical development of XAI in novel domains.
arXiv Detail & Related papers (2022-02-10T08:52:39Z) - Automating Privilege Escalation with Deep Reinforcement Learning [71.87228372303453]
In this work, we exemplify the potential threat of malicious actors using deep reinforcement learning to train automated agents.
We present an agent that uses a state-of-the-art reinforcement learning algorithm to perform local privilege escalation.
Our agent is usable for generating realistic attack sensor data for training and evaluating intrusion detection systems.
arXiv Detail & Related papers (2021-10-04T12:20:46Z) - The Feasibility and Inevitability of Stealth Attacks [63.14766152741211]
We study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence systems.
In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself.
arXiv Detail & Related papers (2021-06-26T10:50:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.