Related papers: Denial-of-Service Poisoning Attacks against Large Language Models

Denial-of-Service Poisoning Attacks against Large Language Models

URL: http://arxiv.org/abs/2410.10760v1
Date: Mon, 14 Oct 2024 17:39:31 GMT
Title: Denial-of-Service Poisoning Attacks against Large Language Models
Authors: Kuofeng Gao, Tianyu Pang, Chao Du, Yong Yang, Shu-Tao Xia, Min Lin,
Abstract summary: LLMs are vulnerable to denial-of-service (DoS) attacks, where spelling errors or non-semantic prompts trigger endless outputs without generating an [EOS] token. We propose poisoning-based DoS attacks for LLMs, demonstrating that injecting a single poisoned sample designed for DoS purposes can break the output length limit.
Score: 64.77355353440691
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent studies have shown that LLMs are vulnerable to denial-of-service (DoS) attacks, where adversarial inputs like spelling errors or non-semantic prompts trigger endless outputs without generating an [EOS] token. These attacks can potentially cause high latency and make LLM services inaccessible to other users or tasks. However, when there are speech-to-text interfaces (e.g., voice commands to a robot), executing such DoS attacks becomes challenging, as it is difficult to introduce spelling errors or non-semantic prompts through speech. A simple DoS attack in these scenarios would be to instruct the model to "Keep repeating Hello", but we observe that relying solely on natural instructions limits output length, which is bounded by the maximum length of the LLM's supervised finetuning (SFT) data. To overcome this limitation, we propose poisoning-based DoS (P-DoS) attacks for LLMs, demonstrating that injecting a single poisoned sample designed for DoS purposes can break the output length limit. For example, a poisoned sample can successfully attack GPT-4o and GPT-4o mini (via OpenAI's finetuning API) using less than $1, causing repeated outputs up to the maximum inference length (16K tokens, compared to 0.5K before poisoning). Additionally, we perform comprehensive ablation studies on open-source LLMs and extend our method to LLM agents, where attackers can control both the finetuning dataset and algorithm. Our findings underscore the urgent need for defenses against P-DoS attacks to secure LLMs. Our code is available at https://github.com/sail-sg/P-DoS.

Related papers

Friend or Foe: How LLMs' Safety Mind Gets Fooled by Intent Shift Attack [53.34204977366491]
Large language models (LLMs) remain vulnerable to jailbreaking attacks despite their impressive capabilities.<n>In this paper, we introduce ISA (Intent Shift Attack), which obfuscates LLMs about the intent of the attacks.<n>Our approach only needs minimal edits to the original request, and yields natural, human-readable, and seemingly harmless prompts.
arXiv Detail & Related papers (2025-11-01T13:44:42Z)
Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs [28.75283403986172]
Large Language Models (LLMs) are vulnerable to prompt-based attacks, generating harmful content or sensitive information.<n>This paper studies effective prompt injection attacks against the $mathbf14$ most popular open-source LLMs on five attack benchmarks.
arXiv Detail & Related papers (2025-05-20T13:50:43Z)
Enhancing Prompt Injection Attacks to LLMs via Poisoning Alignment [35.344406718760574]
A prompt injection attack aims to make an Large Language Model follow an injected prompt to perform an attacker-chosen task.<n>Existing attacks primarily focus on crafting these injections at inference time, treating the LLM itself as a static target.<n>In this work, we introduce a more foundational attack vector: poisoning the LLM's alignment process to amplify the success of future prompt injection attacks.
arXiv Detail & Related papers (2024-10-18T18:52:16Z)
Aligning LLMs to Be Robust Against Prompt Injection [55.07562650579068]
We show that alignment can be a powerful tool to make LLMs more robust against prompt injection attacks. Our method -- SecAlign -- first builds an alignment dataset by simulating prompt injection attacks. Our experiments show that SecAlign robustifies the LLM substantially with a negligible hurt on model utility.
arXiv Detail & Related papers (2024-10-07T19:34:35Z)
Human-Interpretable Adversarial Prompt Attack on Large Language Models with Situational Context [49.13497493053742]
This research explores converting a nonsensical suffix attack into a sensible prompt via a situation-driven contextual re-writing. We combine an independent, meaningful adversarial insertion and situations derived from movies to check if this can trick an LLM. Our approach demonstrates that a successful situation-driven attack can be executed on both open-source and proprietary LLMs.
arXiv Detail & Related papers (2024-07-19T19:47:26Z)
Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models [35.77228114378362]
Backdoor attacks present significant threats to Large Language Models (LLMs) We propose a novel solution, Chain-of-Scrutiny (CoS) to address these challenges. CoS guides the LLMs to generate detailed reasoning steps for the input, then scrutinizes the reasoning process to ensure consistency with the final answer.
arXiv Detail & Related papers (2024-06-10T00:53:25Z)
ASETF: A Novel Method for Jailbreak Attack on LLMs through Translate Suffix Embeddings [58.82536530615557]
We propose an Adversarial Suffix Embedding Translation Framework (ASETF) to transform continuous adversarial suffix embeddings into coherent and understandable text. Our method significantly reduces the computation time of adversarial suffixes and achieves a much better attack success rate to existing techniques.
arXiv Detail & Related papers (2024-02-25T06:46:27Z)
Coercing LLMs to do and reveal (almost) anything [80.8601180293558]
It has been shown that adversarial attacks on large language models (LLMs) can "jailbreak" the model into making harmful statements. We argue that the spectrum of adversarial attacks on LLMs is much larger than merely jailbreaking.
arXiv Detail & Related papers (2024-02-21T18:59:13Z)
Round Trip Translation Defence against Large Language Model Jailbreaking Attacks [11.593052831056841]
We propose the first algorithm specifically designed to defend against social-engineered attacks on large language models. Our defense successfully mitigated over 70% of Prompt Automatic Iterative Refinement (PAIR) attacks. We are also the first to attempt mitigating the MathsAttack and reduced its attack success rate by almost 40%.
arXiv Detail & Related papers (2024-02-21T03:59:52Z)
Instruction Backdoor Attacks Against Customized LLMs [37.92008159382539]
We propose the first instruction backdoor attacks against applications integrated with untrusted customized LLMs. Our attack includes 3 levels of attacks: word-level, syntax-level, and semantic-level, which adopt different types of triggers with progressive stealthiness. We propose two defense strategies and demonstrate their effectiveness in reducing such attacks.
arXiv Detail & Related papers (2024-02-14T13:47:35Z)
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game [86.66627242073724]
This paper presents a dataset of over 126,000 prompt injection attacks and 46,000 prompt-based "defenses" against prompt injection. To the best of our knowledge, this is currently the largest dataset of human-generated adversarial examples for instruction-following LLMs. We also use the dataset to create a benchmark for resistance to two types of prompt injection, which we refer to as prompt extraction and prompt hijacking.
arXiv Detail & Related papers (2023-11-02T06:13:36Z)
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks [99.23352758320945]
We propose SmoothLLM, the first algorithm designed to mitigate jailbreaking attacks on large language models (LLMs) Based on our finding that adversarially-generated prompts are brittle to character-level changes, our defense first randomly perturbs multiple copies of a given input prompt, and then aggregates the corresponding predictions to detect adversarial inputs.
arXiv Detail & Related papers (2023-10-05T17:01:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.