GPUHammer: Rowhammer Attacks on GPU Memories are Practical
- URL: http://arxiv.org/abs/2507.08166v1
- Date: Thu, 10 Jul 2025 20:57:47 GMT
- Title: GPUHammer: Rowhammer Attacks on GPU Memories are Practical
- Authors: Chris S. Lin, Joyce Qu, Gururaj Saileshwar,
- Abstract summary: We demonstrate the first successful Rowhammer attack on a discrete GPU.<n>We show how an attacker can use these to tamper with ML models, causing significant accuracy drops (up to 80%)<n>We also show how an attacker can use these to tamper with ML models, causing significant accuracy drops (up to 80%)
- Score: 3.3625059118072107
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Rowhammer is a read disturbance vulnerability in modern DRAM that causes bit-flips, compromising security and reliability. While extensively studied on Intel and AMD CPUs with DDR and LPDDR memories, its impact on GPUs using GDDR memories, critical for emerging machine learning applications, remains unexplored. Rowhammer attacks on GPUs face unique challenges: (1) proprietary mapping of physical memory to GDDR banks and rows, (2) high memory latency and faster refresh rates that hinder effective hammering, and (3) proprietary mitigations in GDDR memories, difficult to reverse-engineer without FPGA-based test platforms. We introduce GPUHammer, the first Rowhammer attack on NVIDIA GPUs with GDDR6 DRAM. GPUHammer proposes novel techniques to reverse-engineer GDDR DRAM row mappings, and employs GPU-specific memory access optimizations to amplify hammering intensity and bypass mitigations. Thus, we demonstrate the first successful Rowhammer attack on a discrete GPU, injecting up to 8 bit-flips across 4 DRAM banks on an NVIDIA A6000 with GDDR6 memory. We also show how an attacker can use these to tamper with ML models, causing significant accuracy drops (up to 80%).
Related papers
- Kraken: Higher-order EM Side-Channel Attacks on DNNs in Near and Far Field [26.561261723476687]
Multi-million dollar investment has made large ML models a prime target for theft.<n>Attacks based on physical side-channel information have shown that model extraction is feasible, even on Cores in a GPU.<n>For the first time, our work demonstrates parameter extraction on the specialized GPU's Core units.
arXiv Detail & Related papers (2026-03-03T11:40:13Z) - OpenGL GPU-Based Rowhammer Attack (Work in Progress) [0.0]
This paper presents an adaptive, many-sided Rowhammer attack utilizing GPU compute shaders.<n>Our approach employs statistical distributions to optimize row targeting and avoid current mitigations.<n> Experimental results on a Raspberry Pi 4 demonstrate that the GPU-based approach attains a high rate of bit flips compared to traditional CPU-based hammering.
arXiv Detail & Related papers (2025-09-24T10:11:05Z) - GPU in the Blind Spot: Overlooked Security Risks in Transportation [3.3296812191509786]
This paper highlights GPU security as a critical blind spot in transportation cybersecurity.<n>To support this concern, it also presents a case study showing the impact of stealthy unauthorized crypto miners on critical AI workloads.
arXiv Detail & Related papers (2025-08-04T02:25:43Z) - Minute-Long Videos with Dual Parallelisms [57.22737565366549]
Diffusion Transformer (DiT)-based video diffusion models generate high-quality videos at scale but incur prohibitive processing latency and memory costs for long videos.<n>We propose a novel distributed inference strategy, termed DualParal.<n>Instead of generating an entire video on a single GPU, we parallelize both temporal frames and model layers across GPUs.
arXiv Detail & Related papers (2025-05-27T11:55:22Z) - GPUMC: A Stateless Model Checker for GPU Weak Memory Concurrency [3.1882747895372217]
GPUMC is a stateless model checker to check the correctness of GPU shared-memory programs under scoped-RC11 weak memory model.<n>We evaluate GPUMC with benchmarks and real-life GPU programs.
arXiv Detail & Related papers (2025-05-26T16:47:44Z) - Characterizing GPU Resilience and Impact on AI/HPC Systems [5.4879032865205986]
This study characterizes GPU resilience in Delta HPC, a large-scale AI system.<n>We used 2.5 years of operational data (11.7 million GPU hours) on GPU errors.
arXiv Detail & Related papers (2025-03-14T22:14:18Z) - Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference [4.497936996651617]
Large language models have been widely adopted across different tasks, but their auto-regressive nature often leads to inefficient resource utilization during inference.<n>In this paper, through an in-depth GPU-level analysis, we reveal that large-batch inference remains memory-bound, with most GPU compute capabilities underutilized.
arXiv Detail & Related papers (2025-03-11T11:21:35Z) - HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading [79.38548165722229]
HEADINFER offloads the KV cache to CPU RAM while avoiding the need to fully store the KV cache for any transformer layer on the GPU.<n>We demonstrate HEADINFER maintains computational efficiency while significantly reducing memory footprint.
arXiv Detail & Related papers (2025-02-18T06:26:05Z) - NeRF-XL: Scaling NeRFs with Multiple GPUs [72.75214892939411]
We present NeRF-XL, a principled method for distributing Neural Radiance Fields (NeRFs) across multiple GPU.
We show improvements in reconstruction quality with larger parameter counts and speed improvements with more GPU.
We demonstrate the effectiveness of NeRF-XL on a wide variety of datasets, including the largest open-source dataset to date, MatrixCity, containing 258K images covering a 25km2 city area.
arXiv Detail & Related papers (2024-04-24T21:43:15Z) - LLMem: Estimating GPU Memory Usage for Fine-Tuning Pre-Trained LLMs [4.536118764799076]
Fine-tuning pre-trained large language models with limited hardware presents challenges due to GPU memory constraints.
We introduce LLMem, a solution that estimates the GPU memory consumption when applying distributed fine-tuning methods.
We show that LLMem accurately estimates peak GPU memory usage on a single GPU, with error rates of up to 1.6%.
arXiv Detail & Related papers (2024-04-16T22:11:35Z) - AI and Memory Wall [81.06494558184049]
We show how memory bandwidth can become the dominant bottleneck for decoder models.
We argue for a redesign in model architecture, training, and deployment strategies to overcome this memory limitation.
arXiv Detail & Related papers (2024-03-21T04:31:59Z) - FusionAI: Decentralized Training and Deploying LLMs with Massive
Consumer-Level GPUs [57.12856172329322]
We envision a decentralized system unlocking the potential vast untapped consumer-level GPU.
This system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity.
arXiv Detail & Related papers (2023-09-03T13:27:56Z) - EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens [57.354304637367555]
We present EVEREST, a surprisingly efficient MVA approach for video representation learning.
It finds tokens containing rich motion features and discards uninformative ones during both pre-training and fine-tuning.
Our method significantly reduces the computation and memory requirements of MVA.
arXiv Detail & Related papers (2022-11-19T09:57:01Z) - An Analysis of Collocation on GPUs for Deep Learning Training [0.0]
Multi-Instance GPU (MIG) is a new technology introduced by NVIDIA that can partition a GPU to better-fit workloads.
In this paper, we examine the performance of a MIG-enabled A100 GPU under deep learning workloads containing various sizes and combinations of models.
arXiv Detail & Related papers (2022-09-13T14:13:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.