ShadowScope: GPU Monitoring and Validation via Composable Side Channel Signals
- URL: http://arxiv.org/abs/2509.00300v2
- Date: Wed, 03 Sep 2025 20:21:21 GMT
- Title: ShadowScope: GPU Monitoring and Validation via Composable Side Channel Signals
- Authors: Ghadeer Almusaddar, Yicheng Zhang, Saber Ganjisaffar, Barry Williams, Yu David Liu, Dmitry Ponomarev, Nael Abu-Ghazaleh,
- Abstract summary: GPU kernels are vulnerable to both traditional memory safety issues and emerging microarchitectural threats.<n>We propose ShadowScope, a monitoring and validation framework that leverages a composable golden model.<n>We also introduce ShadowScope+, a hardware-assisted validation mechanism that integrates lightweight on-chip checks into the GPU pipeline.
- Score: 6.389108369952326
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As modern systems increasingly rely on GPUs for computationally intensive tasks such as machine learning acceleration, ensuring the integrity of GPU computation has become critically important. Recent studies have shown that GPU kernels are vulnerable to both traditional memory safety issues (e.g., buffer overflow attacks) and emerging microarchitectural threats (e.g., Rowhammer attacks), many of which manifest as anomalous execution behaviors observable through side-channel signals. However, existing golden model based validation approaches that rely on such signals are fragile, highly sensitive to interference, and do not scale well across GPU workloads with diverse scheduling behaviors. To address these challenges, we propose ShadowScope, a monitoring and validation framework that leverages a composable golden model. Instead of building a single monolithic reference, ShadowScope decomposes trusted kernel execution into modular, repeatable functions that encode key behavioral features. This composable design captures execution patterns at finer granularity, enabling robust validation that is resilient to noise, workload variation, and interference across GPU workloads. To further reduce reliance on noisy software-only monitoring, we introduce ShadowScope+, a hardware-assisted validation mechanism that integrates lightweight on-chip checks into the GPU pipeline. ShadowScope+ achieves high validation accuracy with an average runtime overhead of just 4.6%, while incurring minimal hardware and design complexity. Together, these contributions demonstrate that side-channel observability can be systematically repurposed into a practical defense for GPU kernel integrity.
Related papers
- Detecting Object Tracking Failure via Sequential Hypothesis Testing [80.7891291021747]
Real-time online object tracking in videos constitutes a core task in computer vision.<n>We propose interpreting object tracking as a sequential hypothesis test, wherein evidence for or against tracking failures is gradually accumulated over time.<n>We propose both supervised and unsupervised variants by leveraging either ground-truth or solely internal tracking information.
arXiv Detail & Related papers (2026-02-13T14:57:15Z) - Timing and Memory Telemetry on GPUs for AI Governance [3.3108773921973316]
We introduce a measurement framework that generates timing and memory-based observables that correlate with compute activity.<n>These primitives provide statistical and behavioral indicators of GPU engagement that remain observable even without trusted firmware, enclaves, or vendor-controlled counters.
arXiv Detail & Related papers (2026-02-10T03:20:06Z) - Building a Robust Risk-Based Access Control System to Combat Ransomware's Capability to Encrypt: A Machine Learning Approach [0.510691253204425]
Ransomware core capability, unauthorized encryption, demands controls that identify and block malicious cryptographic activity without disrupting legitimate use.<n>We present a probabilistic, risk-based access control architecture that couples machine learning inference with mandatory access control to regulate encryption on Linux in real time.
arXiv Detail & Related papers (2026-01-23T14:48:35Z) - Scalable GPU-Based Integrity Verification for Large Machine Learning Models [4.301162531343759]
We present a security framework that strengthens distributed machine learning by standardizing integrity protections across CPU and GPU platforms.<n>Our approach co-locates integrity verification directly with large ML model execution on GPU accelerators.<n>We provide a hardware-agnostic foundation that enterprise teams can deploy regardless of their underlying CPU and GPU infrastructures.
arXiv Detail & Related papers (2025-10-27T23:45:21Z) - LibIHT: A Hardware-Based Approach to Efficient and Evasion-Resistant Dynamic Binary Analysis [4.42052953892569]
LibIHT is a hardware-assisted tracing framework that captures program control-flow with minimal performance impact.<n>We implement LibIHT as an OS kernel module and user-space library, and evaluate it on both benign benchmark programs and adversarial anti-instrumentation samples.
arXiv Detail & Related papers (2025-10-17T22:42:33Z) - OpenGL GPU-Based Rowhammer Attack (Work in Progress) [0.0]
This paper presents an adaptive, many-sided Rowhammer attack utilizing GPU compute shaders.<n>Our approach employs statistical distributions to optimize row targeting and avoid current mitigations.<n> Experimental results on a Raspberry Pi 4 demonstrate that the GPU-based approach attains a high rate of bit flips compared to traditional CPU-based hammering.
arXiv Detail & Related papers (2025-09-24T10:11:05Z) - Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models [57.49136894315871]
New paradigm of test-time scaling has yielded remarkable breakthroughs in reasoning models and generative vision models.<n>We propose one solution to the problem of integrating test-time scaling knowledge into a model during post-training.<n>We replace reward guided test-time noise optimization in diffusion models with a Noise Hypernetwork that modulates initial input noise.
arXiv Detail & Related papers (2025-08-13T17:33:37Z) - HGCA: Hybrid GPU-CPU Attention for Long Context LLM Inference [8.826966369389893]
We present HGCA, a hybrid CPU- GPU attention mechanism for large language models.<n>We show that HGCA achieves superior scalability, supports longer sequences and larger batch sizes, and outperforms existing sparse attention baselines in both performance and accuracy.<n> Experiments across diverse models and workloads show that HGCA achieves superior scalability, supports longer sequences and larger batch sizes, and outperforms existing sparse attention baselines in both performance and accuracy.
arXiv Detail & Related papers (2025-07-03T20:20:33Z) - CUDA-LLM: LLMs Can Write Efficient CUDA Kernels [9.287036563375617]
Large Language Models (LLMs) have demonstrated strong capabilities in general-purpose code generation.<n>We propose a novel framework called textbfFeature SearchReinforcement (FSR) FSR jointly optimize compilation and functional correctness.
arXiv Detail & Related papers (2025-06-10T10:51:03Z) - NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding [54.88765757043535]
This work rethinks data structures for statistical n-gram language models to enable fast and parallel operations for GPU-optimized inference.<n>Our approach, named NGPU-LM, introduces customizable greedy decoding for all major ASR model types with less than 7% computational overhead.<n>The proposed approach can eliminate more than 50% of the accuracy gap between greedy and beam search for out-of-domain scenarios while avoiding significant slowdown caused by beam search.
arXiv Detail & Related papers (2025-05-28T20:43:10Z) - Robustness of deep learning classification to adversarial input on GPUs: asynchronous parallel accumulation is a source of vulnerability [4.054484966653432]
A key measure of machine learning (ML) classification models' safety and reliability is their ability to resist small, targeted input perturbations.<n>We show that floating-point non-associativity coupled with asynchronous parallel programming on GPU is sufficient to result in misclassification.<n>We also show that standard adversarial robustness results may be overestimated up to 4.6 when not considering machine-level details.
arXiv Detail & Related papers (2025-03-21T14:19:45Z) - Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware Detection [50.7263393517558]
We introduce Holographic Global Convolutional Networks (HGConv) that utilize the properties of Holographic Reduced Representations (HRR)
Unlike other global convolutional methods, our method does not require any intricate kernel computation or crafted kernel design.
The proposed method has achieved new SOTA results on Microsoft Malware Classification Challenge, Drebin, and EMBER malware benchmarks.
arXiv Detail & Related papers (2024-03-23T15:49:13Z) - Whispering Pixels: Exploiting Uninitialized Register Accesses in Modern GPUs [6.1255640691846285]
We showcase the existence of a vulnerability on products of 3 major vendors - Apple, NVIDIA and Qualcomm.
This vulnerability poses unique challenges to an adversary due to opaque scheduling and register remapping algorithms.
We implement information leakage attacks on intermediate data of Convolutional Neural Networks (CNNs) and present the attack's capability to leak and reconstruct the output of Large Language Models (LLMs)
arXiv Detail & Related papers (2024-01-16T23:36:48Z) - FusionAI: Decentralized Training and Deploying LLMs with Massive
Consumer-Level GPUs [57.12856172329322]
We envision a decentralized system unlocking the potential vast untapped consumer-level GPU.
This system faces critical challenges, including limited CPU and GPU memory, low network bandwidth, the variability of peer and device heterogeneity.
arXiv Detail & Related papers (2023-09-03T13:27:56Z) - PolyMPCNet: Towards ReLU-free Neural Architecture Search in Two-party
Computation Based Private Inference [23.795457990555878]
Secure multi-party computation (MPC) has been discussed, to enable the privacy-preserving deep learning (DL) computation.
MPCs often come at very high computation overhead, and potentially prohibit their popularity in large scale systems.
In this work, we develop a systematic framework, PolyMPCNet, of joint overhead reduction of MPC comparison protocol and hardware acceleration.
arXiv Detail & Related papers (2022-09-20T02:47:37Z) - Discriminator-Free Generative Adversarial Attack [87.71852388383242]
Agenerative-based adversarial attacks can get rid of this limitation.
ASymmetric Saliency-based Auto-Encoder (SSAE) generates the perturbations.
The adversarial examples generated by SSAE not only make thewidely-used models collapse, but also achieves good visual quality.
arXiv Detail & Related papers (2021-07-20T01:55:21Z) - FCOS: A simple and strong anchor-free object detector [111.87691210818194]
We propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion.
Almost all state-of-the-art object detectors such as RetinaNet, SSD, YOLOv3, and Faster R-CNN rely on pre-defined anchor boxes.
In contrast, our proposed detector FCOS is anchor box free, as well as proposal free.
arXiv Detail & Related papers (2020-06-14T01:03:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.