BitHydra: Towards Bit-flip Inference Cost Attack against Large Language Models
- URL: http://arxiv.org/abs/2505.16670v2
- Date: Tue, 27 May 2025 10:55:21 GMT
- Title: BitHydra: Towards Bit-flip Inference Cost Attack against Large Language Models
- Authors: Xiaobei Yan, Yiming Li, Zhaoxin Fan, Han Qiu, Tianwei Zhang,
- Abstract summary: This paper introduces a new type of inference cost attacks (dubbed 'bit-flip inference cost attack') that target the victim model itself rather than its inputs.<n>Specifically, we design a simple yet effective method (dubbed 'BitHydra') to effectively flip critical bits of model parameters.<n>With just 4 search samples and as few as 3 bit flips, BitHydra can force 100% of test prompts to reach the maximum generation length.
- Score: 19.856128742435814
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have shown impressive capabilities across a wide range of applications, but their ever-increasing size and resource demands make them vulnerable to inference cost attacks, where attackers induce victim LLMs to generate the longest possible output content. In this paper, we revisit existing inference cost attacks and reveal that these methods can hardly produce large-scale malicious effects since they are self-targeting, where attackers are also the users and therefore have to execute attacks solely through the inputs, whose generated content will be charged by LLMs and can only directly influence themselves. Motivated by these findings, this paper introduces a new type of inference cost attacks (dubbed 'bit-flip inference cost attack') that target the victim model itself rather than its inputs. Specifically, we design a simple yet effective method (dubbed 'BitHydra') to effectively flip critical bits of model parameters. This process is guided by a loss function designed to suppress <EOS> token's probability with an efficient critical bit search algorithm, thus explicitly defining the attack objective and enabling effective optimization. We evaluate our method on 11 LLMs ranging from 1.5B to 14B parameters under both int8 and float16 settings. Experimental results demonstrate that with just 4 search samples and as few as 3 bit flips, BitHydra can force 100% of test prompts to reach the maximum generation length (e.g., 2048 tokens) on representative LLMs such as LLaMA3, highlighting its efficiency, scalability, and strong transferability across unseen inputs.
Related papers
- Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs [28.75283403986172]
Large Language Models (LLMs) are vulnerable to prompt-based attacks, generating harmful content or sensitive information.<n>This paper studies effective prompt injection attacks against the $mathbf14$ most popular open-source LLMs on five attack benchmarks.
arXiv Detail & Related papers (2025-05-20T13:50:43Z) - No Query, No Access [50.18709429731724]
We introduce the textbfVictim Data-based Adrial Attack (VDBA), which operates using only victim texts.<n>To prevent access to the victim model, we create a shadow dataset with publicly available pre-trained models and clustering methods.<n>Experiments on the Emotion and SST5 datasets show that VDBA outperforms state-of-the-art methods, achieving an ASR improvement of 52.08%.
arXiv Detail & Related papers (2025-05-12T06:19:59Z) - ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language Models [55.93380086403591]
Generative large language models are vulnerable to backdoor attacks.<n>$textitELBA-Bench$ allows attackers to inject backdoor through parameter efficient fine-tuning.<n>$textitELBA-Bench$ provides over 1300 experiments.
arXiv Detail & Related papers (2025-02-22T12:55:28Z) - GenBFA: An Evolutionary Optimization Approach to Bit-Flip Attacks on LLMs [3.967858172081495]
Large Language Models (LLMs) have revolutionized natural language processing (NLP)<n>Increasing adoption in mission-critical applications raises concerns about hardware-based threats, particularly bit-flip attacks (BFAs)
arXiv Detail & Related papers (2024-11-21T00:01:51Z) - Target-driven Attack for Large Language Models [14.784132523066567]
We propose our target-driven black-box attack method to maximize the KL divergence between the conditional probabilities of clean text and the attack text.
Experimental results on multiple Large Language Models and datasets demonstrate the effectiveness of our attack method.
arXiv Detail & Related papers (2024-11-09T15:59:59Z) - Denial-of-Service Poisoning Attacks against Large Language Models [64.77355353440691]
LLMs are vulnerable to denial-of-service (DoS) attacks, where spelling errors or non-semantic prompts trigger endless outputs without generating an [EOS] token.
We propose poisoning-based DoS attacks for LLMs, demonstrating that injecting a single poisoned sample designed for DoS purposes can break the output length limit.
arXiv Detail & Related papers (2024-10-14T17:39:31Z) - Goal-guided Generative Prompt Injection Attack on Large Language Models [6.175969971471705]
Large language models (LLMs) provide a strong foundation for large-scale user-oriented natural language tasks.
A large number of users can easily inject adversarial text or instructions through the user interface.
It is unclear how these strategies relate to the success rate of attacks and thus effectively improve model security.
arXiv Detail & Related papers (2024-04-06T06:17:10Z) - DALA: A Distribution-Aware LoRA-Based Adversarial Attack against
Language Models [64.79319733514266]
Adversarial attacks can introduce subtle perturbations to input data.
Recent attack methods can achieve a relatively high attack success rate (ASR)
We propose a Distribution-Aware LoRA-based Adversarial Attack (DALA) method.
arXiv Detail & Related papers (2023-11-14T23:43:47Z) - Versatile Weight Attack via Flipping Limited Bits [68.45224286690932]
We study a novel attack paradigm, which modifies model parameters in the deployment stage.
Considering the effectiveness and stealthiness goals, we provide a general formulation to perform the bit-flip based weight attack.
We present two cases of the general formulation with different malicious purposes, i.e., single sample attack (SSA) and triggered samples attack (TSA)
arXiv Detail & Related papers (2022-07-25T03:24:58Z) - Targeted Attack against Deep Neural Networks via Flipping Limited Weight
Bits [55.740716446995805]
We study a novel attack paradigm, which modifies model parameters in the deployment stage for malicious purposes.
Our goal is to misclassify a specific sample into a target class without any sample modification.
By utilizing the latest technique in integer programming, we equivalently reformulate this BIP problem as a continuous optimization problem.
arXiv Detail & Related papers (2021-02-21T03:13:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.