Timing and Memory Telemetry on GPUs for AI Governance
- URL: http://arxiv.org/abs/2602.09369v2
- Date: Thu, 12 Feb 2026 01:42:39 GMT
- Title: Timing and Memory Telemetry on GPUs for AI Governance
- Authors: Saleh K. Monfared, Fatemeh Ganji, Dan Holcomb, Shahin Tajik,
- Abstract summary: We introduce a measurement framework that generates timing and memory-based observables that correlate with compute activity.<n>These primitives provide statistical and behavioral indicators of GPU engagement that remain observable even without trusted firmware, enclaves, or vendor-controlled counters.
- Score: 3.3108773921973316
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid expansion of GPU-accelerated computing has enabled major advances in large-scale artificial intelligence (AI), while heightening concerns about how accelerators are observed or governed once deployed. Governance is essential to ensure that large-scale compute infrastructure is not silently repurposed for training models, circumventing usage policies, or operating outside legal oversight. Because current GPUs expose limited trusted telemetry and can be modified or virtualized by adversaries, we explore whether compute-based measurements can provide actionable signals of utilization when host and device are untrusted. We introduce a measurement framework that leverages architectural characteristics of modern GPUs to generate timing- and memory-based observables that correlate with compute activity. Our design draws on four complementary primitives: (1) a probabilistic, workload-driven mechanism inspired by Proof-of-Work (PoW) to expose parallel effort, (2) sequential, latency-sensitive workloads derived via Verifiable Delay Functions (VDFs) to characterize scalar execution pressure, (3) General Matrix Multiplication (GEMM)-based tensor-core measurements that reflect dense linear-algebra throughput, and (4) a VRAM-residency test that distinguishes on-device memory locality from off-chip access through bandwidth-dependent hashing. These primitives provide statistical and behavioral indicators of GPU engagement that remain observable even without trusted firmware, enclaves, or vendor-controlled counters. We evaluate their responses to contention, architectural alignment, memory pressure, and power overhead, showing that timing shifts and residency latencies reveal meaningful utilization patterns. Our results illustrate why compute-based telemetry can complement future accountability mechanisms by exposing architectural signals relevant to post-deployment GPU governance.
Related papers
- Enabling Physical AI at the Edge: Hardware-Accelerated Recovery of System Dynamics [4.058950730052848]
textbfMERINDA (Model Recovery in Reconfigurable Dynamic Architecture) is an FPGA-accelerated MR framework designed to make physical AI practical on resource-constrained devices.<n>We show that MERINDA can bring accurate, explainable MR to the edge for real-time monitoring of autonomous systems.
arXiv Detail & Related papers (2025-12-29T04:51:51Z) - An LLVM-Based Optimization Pipeline for SPDZ [0.0]
We implement a proof-of-concept LLVM-based optimization pipeline for the SPDZ protocol.<n>Our front end accepts a subset of C with lightweight privacy annotations and lowers it to LLVM IR.<n>Our back end performs data-flow and control-flow analysis on the optimized IR to drive a non-blocking runtime scheduler.
arXiv Detail & Related papers (2025-12-11T20:53:35Z) - Beyond the GPU: The Strategic Role of FPGAs in the Next Wave of AI [0.0]
Field-Programmable Gate Arrays (FPGAs) are a reconfigurable platform that allows mapping AI algorithms directly into device logic.<n>Unlike CPU and GPU architecture, an FPGA can be reconfigured in the field to adapt its physical structure to a specific model.<n> Partial reconfiguration and compilation flows from AI frameworks are shortening the path from prototype to deployment.
arXiv Detail & Related papers (2025-11-04T03:41:42Z) - ShadowScope: GPU Monitoring and Validation via Composable Side Channel Signals [6.389108369952326]
GPU kernels are vulnerable to both traditional memory safety issues and emerging microarchitectural threats.<n>We propose ShadowScope, a monitoring and validation framework that leverages a composable golden model.<n>We also introduce ShadowScope+, a hardware-assisted validation mechanism that integrates lightweight on-chip checks into the GPU pipeline.
arXiv Detail & Related papers (2025-08-30T01:38:05Z) - MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices [24.1144641404561]
We propose a scheme for exact attention inference acceleration on memory-constrained edge accelerators.<n>We show up to 2.75x speedup and 54% reduction in energy consumption as compared to the state-of-the-art attention fusion method (FLAT) in the edge computing scenario.
arXiv Detail & Related papers (2024-11-20T19:44:26Z) - Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges.
We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs.
This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z) - FLEdge: Benchmarking Federated Machine Learning Applications in Edge Computing Systems [61.335229621081346]
Federated Learning (FL) has become a viable technique for realizing privacy-enhancing distributed deep learning on the network edge.
In this paper, we propose FLEdge, which complements existing FL benchmarks by enabling a systematic evaluation of client capabilities.
arXiv Detail & Related papers (2023-06-08T13:11:20Z) - ETLP: Event-based Three-factor Local Plasticity for online learning with
neuromorphic hardware [105.54048699217668]
We show a competitive performance in accuracy with a clear advantage in the computational complexity for Event-Based Three-factor Local Plasticity (ETLP)
We also show that when using local plasticity, threshold adaptation in spiking neurons and a recurrent topology are necessary to learntemporal patterns with a rich temporal structure.
arXiv Detail & Related papers (2023-01-19T19:45:42Z) - MAPLE-X: Latency Prediction with Explicit Microprocessor Prior Knowledge [87.41163540910854]
Deep neural network (DNN) latency characterization is a time-consuming process.
We propose MAPLE-X which extends MAPLE by incorporating explicit prior knowledge of hardware devices and DNN architecture latency.
arXiv Detail & Related papers (2022-05-25T11:08:20Z) - MAPLE-Edge: A Runtime Latency Predictor for Edge Devices [80.01591186546793]
We propose MAPLE-Edge, an edge device-oriented extension of MAPLE, the state-of-the-art latency predictor for general purpose hardware.
Compared to MAPLE, MAPLE-Edge can describe the runtime and target device platform using a much smaller set of CPU performance counters.
We also demonstrate that unlike MAPLE which performs best when trained on a pool of devices sharing a common runtime, MAPLE-Edge can effectively generalize across runtimes.
arXiv Detail & Related papers (2022-04-27T14:00:48Z) - One-step regression and classification with crosspoint resistive memory
arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge.
One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition.
Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.