NPS: A Framework for Accurate Program Sampling Using Graph Neural
Network
- URL: http://arxiv.org/abs/2304.08880v1
- Date: Tue, 18 Apr 2023 10:13:28 GMT
- Title: NPS: A Framework for Accurate Program Sampling Using Graph Neural
Network
- Authors: Yuanwei Fang, Zihao Liu, Yanheng Lu, Jiawei Liu, Jiajie Li, Yi Jin,
Jian Chen, Yenkuang Chen, Hongzhong Zheng, Yuan Xie
- Abstract summary: This paper introduces Neural Program Sampling (NPS), a novel framework that learns execution embeddings using dynamic snapshots of a Graph Neural Network.
AssemblyNet serves as NPS's graph model and neural architecture, capturing a program's behavior in aspects such as data computation, code path, and data flow.
NPS shows higher accuracy and generality than the state-of-the-art GNN approach in code behavior learning, enabling the generation of high-quality execution embeddings.
- Score: 23.021249354193305
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the end of Moore's Law, there is a growing demand for rapid
architectural innovations in modern processors, such as RISC-V custom
extensions, to continue performance scaling. Program sampling is a crucial step
in microprocessor design, as it selects representative simulation points for
workload simulation. While SimPoint has been the de-facto approach for decades,
its limited expressiveness with Basic Block Vector (BBV) requires
time-consuming human tuning, often taking months, which impedes fast innovation
and agile hardware development. This paper introduces Neural Program Sampling
(NPS), a novel framework that learns execution embeddings using dynamic
snapshots of a Graph Neural Network. NPS deploys AssemblyNet for embedding
generation, leveraging an application's code structures and runtime states.
AssemblyNet serves as NPS's graph model and neural architecture, capturing a
program's behavior in aspects such as data computation, code path, and data
flow. AssemblyNet is trained with a data prefetch task that predicts
consecutive memory addresses.
In the experiments, NPS outperforms SimPoint by up to 63%, reducing the
average error by 38%. Additionally, NPS demonstrates strong robustness with
increased accuracy, reducing the expensive accuracy tuning overhead.
Furthermore, NPS shows higher accuracy and generality than the state-of-the-art
GNN approach in code behavior learning, enabling the generation of high-quality
execution embeddings.
Related papers
- Spatiotemporal Forecasting Meets Efficiency: Causal Graph Process Neural Networks [5.703629317205571]
Causal Graph Graph Processes (CGPs) offer an alternative, using graph filters instead of relational field layers to reduce parameters and minimize memory consumption.
This paper introduces a non-linear model combining CGPs and GNNs fortemporal forecasting. CGProNet employs higher-order graph filters, optimizing the model with fewer parameters, reducing memory usage, and improving runtime efficiency.
arXiv Detail & Related papers (2024-05-29T08:37:48Z) - YFlows: Systematic Dataflow Exploration and Code Generation for
Efficient Neural Network Inference using SIMD Architectures on CPUs [3.1445034800095413]
We address the challenges associated with deploying neural networks on CPUs.
Our novel approach is to use the dataflow of a neural network to explore data reuse opportunities.
Our results show that the dataflow that keeps outputs in SIMD registers consistently yields the best performance.
arXiv Detail & Related papers (2023-10-01T05:11:54Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural
Networks on Edge NPUs [74.83613252825754]
"smart ecosystems" are being formed where sensing happens concurrently rather than standalone.
This is shifting the on-device inference paradigm towards deploying neural processing units (NPUs) at the edge.
We propose a novel early-exit scheduling that allows preemption at run time to account for the dynamicity introduced by the arrival and exiting processes.
arXiv Detail & Related papers (2022-09-27T15:04:01Z) - Sparse Periodic Systolic Dataflow for Lowering Latency and Power
Dissipation of Convolutional Neural Network Accelerators [3.043665249713003]
This paper introduces the sparse periodic systolic (SPS) dataflow, which advances the state-of-the-art hardware accelerator for supporting lightweight neural networks.
By exploiting the regularity of PPS, our sparsity-aware compiler optimally reorders the weights and uses a simple indexing unit in hardware to create matches between the weights and activations.
arXiv Detail & Related papers (2022-06-30T19:16:46Z) - AEGNN: Asynchronous Event-based Graph Neural Networks [54.528926463775946]
Event-based Graph Neural Networks generalize standard GNNs to process events as "evolving"-temporal graphs.
AEGNNs are easily trained on synchronous inputs and can be converted to efficient, "asynchronous" networks at test time.
arXiv Detail & Related papers (2022-03-31T16:21:12Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Learning to Execute Programs with Instruction Pointer Attention Graph
Neural Networks [55.98291376393561]
Graph neural networks (GNNs) have emerged as a powerful tool for learning software engineering tasks.
Recurrent neural networks (RNNs) are well-suited to long sequential chains of reasoning, but they do not naturally incorporate program structure.
We introduce a novel GNN architecture, the Instruction Pointer Attention Graph Neural Networks (IPA-GNN), which improves systematic generalization on the task of learning to execute programs.
arXiv Detail & Related papers (2020-10-23T19:12:30Z) - CSM-NN: Current Source Model Based Logic Circuit Simulation -- A Neural
Network Approach [5.365198933008246]
CSM-NN is a scalable simulation framework with optimized neural network structures and processing algorithms.
Experiments show that CSM-NN reduces the simulation time by up to $6times$ compared to a state-of-the-art current source model based simulator running on a CPU.
CSM-NN also provides high accuracy levels, with less than $2%$ error, compared to HSPICE.
arXiv Detail & Related papers (2020-02-13T00:29:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.