Related papers: SBFA: Single Sneaky Bit Flip Attack to Break Large Language Models

SBFA: Single Sneaky Bit Flip Attack to Break Large Language Models

URL: http://arxiv.org/abs/2509.21843v1
Date: Fri, 26 Sep 2025 04:03:53 GMT
Title: SBFA: Single Sneaky Bit Flip Attack to Break Large Language Models
Authors: Jingkai Guo, Chaitali Chakrabarti, Deliang Fan,
Abstract summary: Bit-Flip Attacks can severely compromise Deep Neural Networks (DNNs)<n>We propose SBFA (Sneaky Bit-Flip Attack), which collapses LLM performance with only one single bit flip.<n>It is achieved through iterative searching and ranking through our defined parameter sensitivity metric, ImpactScore.
Score: 16.379863498328955
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Model integrity of Large language models (LLMs) has become a pressing security concern with their massive online deployment. Prior Bit-Flip Attacks (BFAs) -- a class of popular AI weight memory fault-injection techniques -- can severely compromise Deep Neural Networks (DNNs): as few as tens of bit flips can degrade accuracy toward random guessing. Recent studies extend BFAs to LLMs and reveal that, despite the intuition of better robustness from modularity and redundancy, only a handful of adversarial bit flips can also cause LLMs' catastrophic accuracy degradation. However, existing BFA methods typically focus on either integer or floating-point models separately, limiting attack flexibility. Moreover, in floating-point models, random bit flips often cause perturbed parameters to extreme values (e.g., flipping in exponent bit), making it not stealthy and leading to numerical runtime error (e.g., invalid tensor values (NaN/Inf)). In this work, for the first time, we propose SBFA (Sneaky Bit-Flip Attack), which collapses LLM performance with only one single bit flip while keeping perturbed values within benign layer-wise weight distribution. It is achieved through iterative searching and ranking through our defined parameter sensitivity metric, ImpactScore, which combines gradient sensitivity and perturbation range constrained by the benign layer-wise weight distribution. A novel lightweight SKIP searching algorithm is also proposed to greatly reduce searching complexity, which leads to successful SBFA searching taking only tens of minutes for SOTA LLMs. Across Qwen, LLaMA, and Gemma models, with only one single bit flip, SBFA successfully degrades accuracy to below random levels on MMLU and SST-2 in both BF16 and INT8 data formats. Remarkably, flipping a single bit out of billions of parameters reveals a severe security concern of SOTA LLM models.

Related papers

TFL: Targeted Bit-Flip Attack on Large Language Model [16.379863498328955]
Large language models (LLMs) are increasingly deployed in safety and security critical applications.<n>We present TFL, a novel targeted bit-flip attack framework.<n>Within our TFL framework, we propose a novel keyword-focused attack loss to promote attacker-specified target tokens in generative outputs.
arXiv Detail & Related papers (2026-02-19T20:59:47Z)
FlipLLM: Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning [0.0]
FlipLLM is a framework that formulates BFA discovery as a sequential decision-making problem.<n>We show that FlipLLM can identify critical bits that are vulnerable to BFAs up to 2.5x faster than SOTA methods.<n>Applying standard hardware protection mechanisms, such as ECC SECDED, to FlipLLM-identified bit locations completely mitigates the BFA impact.
arXiv Detail & Related papers (2025-12-10T17:58:18Z)
Has the Two-Decade-Old Prophecy Come True? Artificial Bad Intelligence Triggered by Merely a Single-Bit Flip in Large Language Models [16.552905034341343]
Bit-Flip Attack (BFA) has garnered widespread attention for its ability to compromise software system integrity remotely through hardware fault injection.<n>This paper is the first to systematically discover and validate the existence of single-bit vulnerabilities in large language models (LLMs) using.gguf quantized formats.<n>At an attack frequency of 464.3 times per second, a single bit can be flipped with 100% success in as little as 31.7 seconds.
arXiv Detail & Related papers (2025-10-01T04:20:03Z)
Sequential Diffusion Language Models [110.06562906987052]
Diffusion language models (DLMs) have strong theoretical efficiency but are limited by fixed-length decoding and incompatibility with key-value caches.<n>We introduce Next Sequence Prediction (NSP), which unifies next-token and next-block prediction.<n>We propose Sequential Diffusion Language Model (SDLM), which can retrofit pre-trained autoregressive language models (ALMs) at minimal cost.
arXiv Detail & Related papers (2025-09-28T17:59:15Z)
MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on Large Language Models [53.36415620647177]
Semi-structured sparsity offers a promising solution by strategically retaining $N$ elements out of every $M$ weights.<n>Existing (N:M)-compatible approaches typically fall into two categories: rule-based layerwise greedy search, which suffers from considerable errors, and gradient-driven learning, which incurs prohibitive training costs.<n>We propose a novel linear-space probabilistic framework named MaskPro, which aims to learn a prior categorical distribution for every $M$ consecutive weights and subsequently leverages this distribution to generate the (N:M)-sparsity throughout an $N$-way sampling
arXiv Detail & Related papers (2025-06-15T15:02:59Z)
ObfusBFA: A Holistic Approach to Safeguarding DNNs from Different Types of Bit-Flip Attacks [12.96840649714218]
Bit-flip attacks (BFAs) represent a serious threat to Deep Neural Networks (DNNs)<n>We propose ObfusBFA, an efficient and holistic methodology to mitigate BFAs.<n>We design novel algorithms to identify critical bits and insert obfuscation operations.
arXiv Detail & Related papers (2025-06-12T14:31:27Z)
GenBFA: An Evolutionary Optimization Approach to Bit-Flip Attacks on LLMs [3.967858172081495]
Large Language Models (LLMs) have revolutionized natural language processing (NLP)<n>Increasing adoption in mission-critical applications raises concerns about hardware-based threats, particularly bit-flip attacks (BFAs)
arXiv Detail & Related papers (2024-11-21T00:01:51Z)
Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models [79.76293901420146]
Large Language Models (LLMs) are employed across various high-stakes domains, where the reliability of their outputs is crucial. Our research investigates the fragility of uncertainty estimation and explores potential attacks. We demonstrate that an attacker can embed a backdoor in LLMs, which, when activated by a specific trigger in the input, manipulates the model's uncertainty without affecting the final output.
arXiv Detail & Related papers (2024-07-15T23:41:11Z)
Advancing the Robustness of Large Language Models through Self-Denoised Smoothing [50.54276872204319]
Large language models (LLMs) have achieved significant success, but their vulnerability to adversarial perturbations has raised considerable concerns. We propose to leverage the multitasking nature of LLMs to first denoise the noisy inputs and then to make predictions based on these denoised versions. Unlike previous denoised smoothing techniques in computer vision, which require training a separate model to enhance the robustness of LLMs, our method offers significantly better efficiency and flexibility.
arXiv Detail & Related papers (2024-04-18T15:47:00Z)
FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping [49.66872823080736]
Autoregressive Large Language Models (e.g., LLaMa, GPTs) are omnipresent achieving remarkable success in language understanding and generation. To mitigate overload incurred during generation, several early-exit and layer-dropping strategies have been proposed. We propose FFN-SkipLLM, which is an input-adaptive feed-forward skipping strategy.
arXiv Detail & Related papers (2024-04-05T02:35:43Z)
One-bit Flip is All You Need: When Bit-flip Attack Meets Model Training [54.622474306336635]
A new weight modification attack called bit flip attack (BFA) was proposed, which exploits memory fault inject techniques. We propose a training-assisted bit flip attack, in which the adversary is involved in the training stage to build a high-risk model to release.
arXiv Detail & Related papers (2023-08-12T09:34:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.