Related papers: BitHydra: Towards Bit-flip Inference Cost Attack against Large Language Models

BitHydra: Towards Bit-flip Inference Cost Attack against Large Language Models

URL: http://arxiv.org/abs/2505.16670v3
Date: Mon, 29 Sep 2025 04:08:08 GMT
Title: BitHydra: Towards Bit-flip Inference Cost Attack against Large Language Models
Authors: Xiaobei Yan, Yiming Li, Hao Wang, Han Qiu, Tianwei Zhang,
Abstract summary: We introduce the first bit-flip inference cost attack that directly modifies model weights to induce persistent overhead for all users of a compromised LLM.<n>We instantiate this attack paradigm with BitHydra, which (1) minimizes a loss that suppresses the end-of-sequence token (i.e., EOS) and (2) employs an efficient yet effective critical-bit search focused on the EOS embedding vector.
Score: 22.695878922889715
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) are widely deployed, but their growing compute demands expose them to inference cost attacks that maximize output length. We reveal that prior attacks are fundamentally self-targeting because they rely on crafted inputs, so the added cost accrues to the attacker's own queries and scales poorly in practice. In this work, we introduce the first bit-flip inference cost attack that directly modifies model weights to induce persistent overhead for all users of a compromised LLM. Such attacks are stealthy yet realistic in practice: for instance, in shared MLaaS environments, co-located tenants can exploit hardware-level faults (e.g., Rowhammer) to flip memory bits storing model parameters. We instantiate this attack paradigm with BitHydra, which (1) minimizes a loss that suppresses the end-of-sequence token (i.e., EOS) and (2) employs an efficient yet effective critical-bit search focused on the EOS embedding vector, sharply reducing the search space while preserving benign-looking outputs. We evaluate across 11 LLMs (1.5B-14B) under int8 and float16, demonstrating that our method efficiently achieves scalable cost inflation with only a few bit flips, while remaining effective even against potential defenses.

Related papers

Good-Enough LLM Obfuscation (GELO) [0.0]
Large Language Models (LLMs) are increasingly served on shared accelerators where an adversary with read access to device memory can observe KV caches and hidden states.<n>We present GELO, a protocol for privacy-preserving inference that limits information leakage from untrusted accelerator observations.
arXiv Detail & Related papers (2026-03-05T10:33:48Z)
SilentStriker:Toward Stealthy Bit-Flip Attacks on Large Language Models [13.200372347541142]
Bit-Flip Attacks (BFAs) exploit hardware vulnerabilities to corrupt model parameters and cause severe performance degradation.<n>Existing BFA methods fail to balance performance degradation and output naturalness, making them prone to discovery.<n>SilentStriker is the first stealthy bit-flip attack against LLMs that effectively degrades task performance while maintaining output naturalness.
arXiv Detail & Related papers (2025-09-22T05:36:18Z)
Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs [28.75283403986172]
Large Language Models (LLMs) are vulnerable to prompt-based attacks, generating harmful content or sensitive information.<n>This paper studies effective prompt injection attacks against the $mathbf14$ most popular open-source LLMs on five attack benchmarks.
arXiv Detail & Related papers (2025-05-20T13:50:43Z)
No Query, No Access [50.18709429731724]
We introduce the textbfVictim Data-based Adrial Attack (VDBA), which operates using only victim texts.<n>To prevent access to the victim model, we create a shadow dataset with publicly available pre-trained models and clustering methods.<n>Experiments on the Emotion and SST5 datasets show that VDBA outperforms state-of-the-art methods, achieving an ASR improvement of 52.08%.
arXiv Detail & Related papers (2025-05-12T06:19:59Z)
ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language Models [55.93380086403591]
Generative large language models are vulnerable to backdoor attacks.<n>$textitELBA-Bench$ allows attackers to inject backdoor through parameter efficient fine-tuning.<n>$textitELBA-Bench$ provides over 1300 experiments.
arXiv Detail & Related papers (2025-02-22T12:55:28Z)
Fast Proxies for LLM Robustness Evaluation [48.53873823665833]
We compare the ability of fast proxy metrics to predict the real-world robustness of an LLM against a simulated attacker ensemble.<n>This allows us to estimate a model's robustness to computationally expensive attacks without requiring runs of the attacks themselves.
arXiv Detail & Related papers (2025-02-14T11:15:27Z)
GenBFA: An Evolutionary Optimization Approach to Bit-Flip Attacks on LLMs [3.967858172081495]
Large Language Models (LLMs) have revolutionized natural language processing (NLP)<n>Increasing adoption in mission-critical applications raises concerns about hardware-based threats, particularly bit-flip attacks (BFAs)
arXiv Detail & Related papers (2024-11-21T00:01:51Z)
Target-driven Attack for Large Language Models [14.784132523066567]
We propose our target-driven black-box attack method to maximize the KL divergence between the conditional probabilities of clean text and the attack text. Experimental results on multiple Large Language Models and datasets demonstrate the effectiveness of our attack method.
arXiv Detail & Related papers (2024-11-09T15:59:59Z)
Denial-of-Service Poisoning Attacks against Large Language Models [64.77355353440691]
LLMs are vulnerable to denial-of-service (DoS) attacks, where spelling errors or non-semantic prompts trigger endless outputs without generating an [EOS] token. We propose poisoning-based DoS attacks for LLMs, demonstrating that injecting a single poisoned sample designed for DoS purposes can break the output length limit.
arXiv Detail & Related papers (2024-10-14T17:39:31Z)
Goal-guided Generative Prompt Injection Attack on Large Language Models [6.175969971471705]
Large language models (LLMs) provide a strong foundation for large-scale user-oriented natural language tasks. A large number of users can easily inject adversarial text or instructions through the user interface. It is unclear how these strategies relate to the success rate of attacks and thus effectively improve model security.
arXiv Detail & Related papers (2024-04-06T06:17:10Z)
DALA: A Distribution-Aware LoRA-Based Adversarial Attack against Language Models [64.79319733514266]
Adversarial attacks can introduce subtle perturbations to input data. Recent attack methods can achieve a relatively high attack success rate (ASR) We propose a Distribution-Aware LoRA-based Adversarial Attack (DALA) method.
arXiv Detail & Related papers (2023-11-14T23:43:47Z)
Transferable Attack for Semantic Segmentation [59.17710830038692]
adversarial attacks, and observe that the adversarial examples generated from a source model fail to attack the target models. We propose an ensemble attack for semantic segmentation to achieve more effective attacks with higher transferability.
arXiv Detail & Related papers (2023-07-31T11:05:55Z)
Hindering Adversarial Attacks with Implicit Neural Representations [25.422201099331637]
Lossy Implicit Network Activation Coding (LINAC) defence successfully hinders several common adversarial attacks. We devise a Parametric Bypass Approximation (PBA) attack strategy for key-based defences, which successfully invalidates an existing method in this category.
arXiv Detail & Related papers (2022-10-22T13:10:24Z)
Versatile Weight Attack via Flipping Limited Bits [68.45224286690932]
We study a novel attack paradigm, which modifies model parameters in the deployment stage. Considering the effectiveness and stealthiness goals, we provide a general formulation to perform the bit-flip based weight attack. We present two cases of the general formulation with different malicious purposes, i.e., single sample attack (SSA) and triggered samples attack (TSA)
arXiv Detail & Related papers (2022-07-25T03:24:58Z)
PDPGD: Primal-Dual Proximal Gradient Descent Adversarial Attack [92.94132883915876]
State-of-the-art deep neural networks are sensitive to small input perturbations. Many defence methods have been proposed that attempt to improve robustness to adversarial noise. evaluating adversarial robustness has proven to be extremely challenging.
arXiv Detail & Related papers (2021-06-03T01:45:48Z)
Targeted Attack against Deep Neural Networks via Flipping Limited Weight Bits [55.740716446995805]
We study a novel attack paradigm, which modifies model parameters in the deployment stage for malicious purposes. Our goal is to misclassify a specific sample into a target class without any sample modification. By utilizing the latest technique in integer programming, we equivalently reformulate this BIP problem as a continuous optimization problem.
arXiv Detail & Related papers (2021-02-21T03:13:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.