Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard
Security Attacks
- URL: http://arxiv.org/abs/2302.05733v1
- Date: Sat, 11 Feb 2023 15:57:44 GMT
- Title: Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard
Security Attacks
- Authors: Daniel Kang, Xuechen Li, Ion Stoica, Carlos Guestrin, Matei Zaharia,
Tatsunori Hashimoto
- Abstract summary: Recent advances in instruction-following large language models amplify the dual-use risks for malicious purposes.
Dual-use is difficult to prevent as instruction-following capabilities now enable standard attacks from computer security.
We show that instruction-following LLMs can produce targeted malicious content, including hate speech and scams.
- Score: 67.86285142381644
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in instruction-following large language models (LLMs) have
led to dramatic improvements in a range of NLP tasks. Unfortunately, we find
that the same improved capabilities amplify the dual-use risks for malicious
purposes of these models. Dual-use is difficult to prevent as
instruction-following capabilities now enable standard attacks from computer
security. The capabilities of these instruction-following LLMs provide strong
economic incentives for dual-use by malicious actors. In particular, we show
that instruction-following LLMs can produce targeted malicious content,
including hate speech and scams, bypassing in-the-wild defenses implemented by
LLM API vendors. Our analysis shows that this content can be generated
economically and at cost likely lower than with human effort alone. Together,
our findings suggest that LLMs will increasingly attract more sophisticated
adversaries and attacks, and addressing these attacks may require new
approaches to mitigations.
Related papers
- Mitigating Backdoor Threats to Large Language Models: Advancement and Challenges [46.032173498399885]
Large Language Models (LLMs) have significantly impacted various domains, including Web search, healthcare, and software development.
As these models scale, they become more vulnerable to cybersecurity risks, particularly backdoor attacks.
arXiv Detail & Related papers (2024-09-30T06:31:36Z) - MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants [14.947665219536708]
We introduce the Malicious Programming Prompt (MaPP) attack, in which an attacker adds a small amount of text to a prompt for a programming task.
We show that our prompt strategy can cause an LLM to add vulnerabilities while continuing to write otherwise correct code.
arXiv Detail & Related papers (2024-07-12T22:30:35Z) - ASETF: A Novel Method for Jailbreak Attack on LLMs through Translate Suffix Embeddings [58.82536530615557]
We propose an Adversarial Suffix Embedding Translation Framework (ASETF) to transform continuous adversarial suffix embeddings into coherent and understandable text.
Our method significantly reduces the computation time of adversarial suffixes and achieves a much better attack success rate to existing techniques.
arXiv Detail & Related papers (2024-02-25T06:46:27Z) - Learning to Poison Large Language Models During Instruction Tuning [12.521338629194503]
This work identifies additional security risks in Large Language Models (LLMs) by designing a new data poisoning attack tailored to exploit the instruction tuning process.
We propose a novel gradient-guided backdoor trigger learning (GBTL) algorithm to identify adversarial triggers efficiently.
We propose two defense strategies against data poisoning attacks, including in-context learning (ICL) and continuous learning (CL)
arXiv Detail & Related papers (2024-02-21T01:30:03Z) - The Philosopher's Stone: Trojaning Plugins of Large Language Models [22.67696768099352]
Open-source Large Language Models (LLMs) have recently gained popularity because of their comparable performance to proprietary LLMs.
To efficiently fulfill domain-specialized tasks, open-source LLMs can be refined, without expensive accelerators, using low-rank adapters.
It is still unknown whether low-rank adapters can be exploited to control LLMs.
arXiv Detail & Related papers (2023-12-01T06:36:17Z) - Attack Prompt Generation for Red Teaming and Defending Large Language
Models [70.157691818224]
Large language models (LLMs) are susceptible to red teaming attacks, which can induce LLMs to generate harmful content.
We propose an integrated approach that combines manual and automatic methods to economically generate high-quality attack prompts.
arXiv Detail & Related papers (2023-10-19T06:15:05Z) - Privacy in Large Language Models: Attacks, Defenses and Future Directions [84.73301039987128]
We analyze the current privacy attacks targeting large language models (LLMs) and categorize them according to the adversary's assumed capabilities.
We present a detailed overview of prominent defense strategies that have been developed to counter these privacy attacks.
arXiv Detail & Related papers (2023-10-16T13:23:54Z) - Evaluating the Instruction-Following Robustness of Large Language Models
to Prompt Injection [70.28425745910711]
Large Language Models (LLMs) have demonstrated exceptional proficiency in instruction-following.
This capability brings with it the risk of prompt injection attacks.
We evaluate the robustness of instruction-following LLMs against such attacks.
arXiv Detail & Related papers (2023-08-17T06:21:50Z) - Red Teaming Language Model Detectors with Language Models [114.36392560711022]
Large language models (LLMs) present significant safety and ethical risks if exploited by malicious users.
Recent works have proposed algorithms to detect LLM-generated text and protect LLMs.
We study two types of attack strategies: 1) replacing certain words in an LLM's output with their synonyms given the context; 2) automatically searching for an instructional prompt to alter the writing style of the generation.
arXiv Detail & Related papers (2023-05-31T10:08:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.