Related papers: NPS: A Framework for Accurate Program Sampling Using Graph Neural Network

NPS: A Framework for Accurate Program Sampling Using Graph Neural Network

URL: http://arxiv.org/abs/2304.08880v1
Date: Tue, 18 Apr 2023 10:13:28 GMT
Title: NPS: A Framework for Accurate Program Sampling Using Graph Neural Network
Authors: Yuanwei Fang, Zihao Liu, Yanheng Lu, Jiawei Liu, Jiajie Li, Yi Jin, Jian Chen, Yenkuang Chen, Hongzhong Zheng, Yuan Xie
Abstract summary: This paper introduces Neural Program Sampling (NPS), a novel framework that learns execution embeddings using dynamic snapshots of a Graph Neural Network. AssemblyNet serves as NPS's graph model and neural architecture, capturing a program's behavior in aspects such as data computation, code path, and data flow. NPS shows higher accuracy and generality than the state-of-the-art GNN approach in code behavior learning, enabling the generation of high-quality execution embeddings.
Score: 23.021249354193305
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the end of Moore's Law, there is a growing demand for rapid architectural innovations in modern processors, such as RISC-V custom extensions, to continue performance scaling. Program sampling is a crucial step in microprocessor design, as it selects representative simulation points for workload simulation. While SimPoint has been the de-facto approach for decades, its limited expressiveness with Basic Block Vector (BBV) requires time-consuming human tuning, often taking months, which impedes fast innovation and agile hardware development. This paper introduces Neural Program Sampling (NPS), a novel framework that learns execution embeddings using dynamic snapshots of a Graph Neural Network. NPS deploys AssemblyNet for embedding generation, leveraging an application's code structures and runtime states. AssemblyNet serves as NPS's graph model and neural architecture, capturing a program's behavior in aspects such as data computation, code path, and data flow. AssemblyNet is trained with a data prefetch task that predicts consecutive memory addresses. In the experiments, NPS outperforms SimPoint by up to 63%, reducing the average error by 38%. Additionally, NPS demonstrates strong robustness with increased accuracy, reducing the expensive accuracy tuning overhead. Furthermore, NPS shows higher accuracy and generality than the state-of-the-art GNN approach in code behavior learning, enabling the generation of high-quality execution embeddings.

Related papers

Spatiotemporal Forecasting Meets Efficiency: Causal Graph Process Neural Networks [5.703629317205571]
Causal Graph Graph Processes (CGPs) offer an alternative, using graph filters instead of relational field layers to reduce parameters and minimize memory consumption. This paper introduces a non-linear model combining CGPs and GNNs fortemporal forecasting. CGProNet employs higher-order graph filters, optimizing the model with fewer parameters, reducing memory usage, and improving runtime efficiency.
arXiv Detail & Related papers (2024-05-29T08:37:48Z)
YFlows: Systematic Dataflow Exploration and Code Generation for Efficient Neural Network Inference using SIMD Architectures on CPUs [3.1445034800095413]
We address the challenges associated with deploying neural networks on CPUs. Our novel approach is to use the dataflow of a neural network to explore data reuse opportunities. Our results show that the dataflow that keeps outputs in SIMD registers consistently yields the best performance.
arXiv Detail & Related papers (2023-10-01T05:11:54Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone. This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge. We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z)
Sparse Periodic Systolic Dataflow for Lowering Latency and Power Dissipation of Convolutional Neural Network Accelerators [3.043665249713003]
This paper introduces the sparse periodic systolic (SPS) dataflow, which advances the state-of-the-art hardware accelerator for supporting lightweight neural networks. By exploiting the regularity of PPS, our sparsity-aware compiler optimally reorders the weights and uses a simple indexing unit in hardware to create matches between the weights and activations.
arXiv Detail & Related papers (2022-06-30T19:16:46Z)
AEGNN: Asynchronous Event-based Graph Neural Networks [54.528926463775946]
Event-based Graph Neural Networks generalize standard GNNs to process events as "evolving"-temporal graphs. AEGNNs are easily trained on synchronous inputs and can be converted to efficient, "asynchronous" networks at test time.
arXiv Detail & Related papers (2022-03-31T16:21:12Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
Learning to Execute Programs with Instruction Pointer Attention Graph Neural Networks [55.98291376393561]
Graph neural networks (GNNs) have emerged as a powerful tool for learning software engineering tasks. Recurrent neural networks (RNNs) are well-suited to long sequential chains of reasoning, but they do not naturally incorporate program structure. We introduce a novel GNN architecture, the Instruction Pointer Attention Graph Neural Networks (IPA-GNN), which improves systematic generalization on the task of learning to execute programs.
arXiv Detail & Related papers (2020-10-23T19:12:30Z)
CSM-NN: Current Source Model Based Logic Circuit Simulation -- A Neural Network Approach [5.365198933008246]
CSM-NN is a scalable simulation framework with optimized neural network structures and processing algorithms. Experiments show that CSM-NN reduces the simulation time by up to $6times$ compared to a state-of-the-art current source model based simulator running on a CPU. CSM-NN also provides high accuracy levels, with less than $2%$ error, compared to HSPICE.
arXiv Detail & Related papers (2020-02-13T00:29:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.