Kraken: Higher-order EM Side-Channel Attacks on DNNs in Near and Far Field
- URL: http://arxiv.org/abs/2603.02891v1
- Date: Tue, 03 Mar 2026 11:40:13 GMT
- Title: Kraken: Higher-order EM Side-Channel Attacks on DNNs in Near and Far Field
- Authors: Peter Horvath, Ilia Shumailov, Lukasz Chmielewski, Lejla Batina, Yuval Yarom,
- Abstract summary: Multi-million dollar investment has made large ML models a prime target for theft.<n>Attacks based on physical side-channel information have shown that model extraction is feasible, even on Cores in a GPU.<n>For the first time, our work demonstrates parameter extraction on the specialized GPU's Core units.
- Score: 26.561261723476687
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The multi-million dollar investment required for modern machine learning (ML) has made large ML models a prime target for theft. In response, the field of model stealing has emerged. Attacks based on physical side-channel information have shown that DNN model extraction is feasible, even on CUDA Cores in a GPU. For the first time, our work demonstrates parameter extraction on the specialized GPU's Tensor Core units, most commonly used GPU units nowadays due to their superior performance, via near-field physical side-channel attacks. Previous work targeted only the general-purpose CUDA Cores in the GPU, the functional units that have been part of the GPU since its inception. Our method is tailored to the GPU architecture to accurately estimate energy consumption and derive efficient attacks via Correlation Power Analysis (CPA). Furthermore, we provide an exploratory analysis of hyperparameter and weight leakage from LLMs in far field and demonstrate that the GPU's electromagnetic radiation leaks even 100\,cm away through a glass obstacle.
Related papers
- Scalable GPU-Based Integrity Verification for Large Machine Learning Models [4.301162531343759]
We present a security framework that strengthens distributed machine learning by standardizing integrity protections across CPU and GPU platforms.<n>Our approach co-locates integrity verification directly with large ML model execution on GPU accelerators.<n>We provide a hardware-agnostic foundation that enterprise teams can deploy regardless of their underlying CPU and GPU infrastructures.
arXiv Detail & Related papers (2025-10-27T23:45:21Z) - GPU in the Blind Spot: Overlooked Security Risks in Transportation [3.3296812191509786]
This paper highlights GPU security as a critical blind spot in transportation cybersecurity.<n>To support this concern, it also presents a case study showing the impact of stealthy unauthorized crypto miners on critical AI workloads.
arXiv Detail & Related papers (2025-08-04T02:25:43Z) - Distributed Equivariant Graph Neural Networks for Large-Scale Electronic Structure Prediction [76.62155593340763]
Equivariant Graph Neural Networks (eGNNs) trained on density-functional theory (DFT) data can potentially perform electronic structure prediction at unprecedented scales.<n>However, the graph representations required for this task tend to be densely connected.<n>We present a distributed eGNN implementation which leverages direct GPU communication and introduce a partitioning strategy of the input graph.
arXiv Detail & Related papers (2025-07-04T23:53:47Z) - Accurate GPU Memory Prediction for Deep Learning Jobs through Dynamic Analysis [0.3867363075280544]
Out-of-Memory errors present a primary impediment to model training and efficient resource utilization.<n>VeritasEst is an entirely CPU-based analysis tool capable of accurately predicting the peak GPU memory required for Deep Learning training tasks.<n>Its performance was validated through thousands of experimental runs across convolutional neural network (CNN) models.
arXiv Detail & Related papers (2025-04-04T19:20:03Z) - Characterization of GPU TEE Overheads in Distributed Data Parallel ML Training [7.236249885667945]
Confidential computing (CC) or trusted execution enclaves (TEEs) is now the most common approach to enable secure computing in the cloud.<n>Recent introduction of GPU TEEs by NVIDIA enables machine learning (ML) models to be trained without leaking model weights or data to the cloud provider.<n>We present an in-depth characterization study on performance overhead associated with running distributed data parallel (DDP) ML training with GPU TEEs.
arXiv Detail & Related papers (2025-01-20T22:23:50Z) - MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs [55.95879347182669]
MoE architecture is renowned for its ability to increase model capacity without a proportional increase in inference cost.
MoE-Lightning introduces a novel CPU-GPU-I/O pipelining schedule, CGOPipe, with paged weights to achieve high resource utilization.
MoE-Lightning can achieve up to 10.3x higher throughput than state-of-the-art offloading-enabled LLM inference systems for Mixtral 8x7B on a single T4 GPU (16GB)
arXiv Detail & Related papers (2024-11-18T01:06:12Z) - GPU-accelerated Effective Hamiltonian Calculator [70.12254823574538]
We present numerical techniques inspired by Nonperturbative Analytical Diagonalization (NPAD) and the Magnus expansion for the efficient calculation of effective Hamiltonians.<n>Our numerical techniques are available as an open-source Python package, $rm qCH_eff$.
arXiv Detail & Related papers (2024-11-15T06:33:40Z) - NeRF-XL: Scaling NeRFs with Multiple GPUs [72.75214892939411]
We present NeRF-XL, a principled method for distributing Neural Radiance Fields (NeRFs) across multiple GPU.
We show improvements in reconstruction quality with larger parameter counts and speed improvements with more GPU.
We demonstrate the effectiveness of NeRF-XL on a wide variety of datasets, including the largest open-source dataset to date, MatrixCity, containing 258K images covering a 25km2 city area.
arXiv Detail & Related papers (2024-04-24T21:43:15Z) - FusionAI: Decentralized Training and Deploying LLMs with Massive
Consumer-Level GPUs [57.12856172329322]
We envision a decentralized system unlocking the potential vast untapped consumer-level GPU.
This system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity.
arXiv Detail & Related papers (2023-09-03T13:27:56Z) - EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens [57.354304637367555]
We present EVEREST, a surprisingly efficient MVA approach for video representation learning.
It finds tokens containing rich motion features and discards uninformative ones during both pre-training and fine-tuning.
Our method significantly reduces the computation and memory requirements of MVA.
arXiv Detail & Related papers (2022-11-19T09:57:01Z) - An Analysis of Collocation on GPUs for Deep Learning Training [0.0]
Multi-Instance GPU (MIG) is a new technology introduced by NVIDIA that can partition a GPU to better-fit workloads.
In this paper, we examine the performance of a MIG-enabled A100 GPU under deep learning workloads containing various sizes and combinations of models.
arXiv Detail & Related papers (2022-09-13T14:13:06Z) - CryptGPU: Fast Privacy-Preserving Machine Learning on the GPU [8.633428365391666]
CryptGPU is a system for privacy-preserving machine learning that implements all operations on the GPU.
We introduce a new interface to embed cryptographic operations over secret-shared values into floating-point operations.
We show that our protocols achieve a 2x to 8x improvement in private inference and a 6x to 36x improvement for private training.
arXiv Detail & Related papers (2021-04-22T09:21:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.