Hierarchical Source-to-Post-Route QoR Prediction in High-Level Synthesis
with GNNs
- URL: http://arxiv.org/abs/2401.08696v1
- Date: Sun, 14 Jan 2024 07:24:08 GMT
- Title: Hierarchical Source-to-Post-Route QoR Prediction in High-Level Synthesis
with GNNs
- Authors: Mingzhe Gao, Jieru Zhao, Zhe Lin, Minyi Guo
- Abstract summary: We propose a hierarchical post-route QoR prediction approach for FPGA HLS.
By adopting our proposed methodology, the runtime for design space exploration in HLS is shortened to tens of minutes.
- Score: 25.920672727699984
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High-level synthesis (HLS) notably speeds up the hardware design process by
avoiding RTL programming. However, the turnaround time of HLS increases
significantly when post-route quality of results (QoR) are considered during
optimization. To tackle this issue, we propose a hierarchical post-route QoR
prediction approach for FPGA HLS, which features: (1) a modeling flow that
directly estimates latency and post-route resource usage from C/C++ programs;
(2) a graph construction method that effectively represents the control and
data flow graph of source code and effects of HLS pragmas; and (3) a
hierarchical GNN training and prediction method capable of capturing the impact
of loop hierarchies. Experimental results show that our method presents a
prediction error of less than 10% for different types of QoR metrics, which
gains tremendous improvement compared with the state-of-the-art GNN methods. By
adopting our proposed methodology, the runtime for design space exploration in
HLS is shortened to tens of minutes and the achieved ADRS is reduced to 6.91%
on average.
Related papers
- Quantum Algorithm Exploration using Application-Oriented Performance
Benchmarks [0.0]
The QED-C suite of Application-Oriented Benchmarks provides the ability to gauge performance characteristics of quantum computers.
We investigate challenges in broadening the relevance of this benchmarking methodology to applications of greater complexity.
arXiv Detail & Related papers (2024-02-14T06:55:50Z) - Efficient Heterogeneous Graph Learning via Random Projection [65.65132884606072]
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs.
Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors.
We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN)
arXiv Detail & Related papers (2023-10-23T01:25:44Z) - ST-MLP: A Cascaded Spatio-Temporal Linear Framework with
Channel-Independence Strategy for Traffic Forecasting [47.74479442786052]
Current research on Spatio-Temporal Graph Neural Networks (STGNNs) often prioritizes complex designs, leading to computational burdens with only minor enhancements in accuracy.
We propose ST-MLP, a concise cascaded temporal-temporal model solely based on Multi-Layer Perceptron (MLP) modules and linear layers.
Empirical results demonstrate that ST-MLP outperforms state-of-the-art STGNNs and other models in terms of accuracy and computational efficiency.
arXiv Detail & Related papers (2023-08-14T23:34:59Z) - Algorithm and System Co-design for Efficient Subgraph-based Graph
Representation Learning [16.170895692951]
Subgraph-based graph representation learning (SGRL) has been recently proposed to deal with some fundamental challenges encountered by canonical graph neural networks (GNNs)
We propose a novel framework SUREL for scalable SGRL by co-designing the learning algorithm and its system support.
arXiv Detail & Related papers (2022-02-28T04:29:22Z) - High-Level Synthesis Performance Prediction using GNNs: Benchmarking,
Modeling, and Advancing [21.8349113634555]
Agile hardware development requires fast and accurate circuit quality evaluation from early design stages.
We propose a rapid and accurate performance modeling, exploiting the representation power of graph neural networks (GNNs) by representing C/C++ programs as graphs.
Our proposed predictor largely outperforms HLS by up to 40X and excels existing predictors by 2X to 5X in terms of resource usage and timing prediction.
arXiv Detail & Related papers (2022-01-18T09:53:48Z) - A Graph Deep Learning Framework for High-Level Synthesis Design Space
Exploration [11.154086943903696]
High-Level Synthesis is a solution for fast prototyping application-specific hardware.
We propose HLS, for the first time in the literature, graph neural networks that jointly predict acceleration performance and hardware costs.
We show that our approach achieves prediction accuracy comparable with that of commonly used simulators.
arXiv Detail & Related papers (2021-11-29T18:17:45Z) - Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time
Mobile Acceleration [71.80326738527734]
We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations.
We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
arXiv Detail & Related papers (2021-11-22T23:53:14Z) - CATRO: Channel Pruning via Class-Aware Trace Ratio Optimization [61.71504948770445]
We propose a novel channel pruning method via Class-Aware Trace Ratio Optimization (CATRO) to reduce the computational burden and accelerate the model inference.
We show that CATRO achieves higher accuracy with similar cost or lower cost with similar accuracy than other state-of-the-art channel pruning algorithms.
Because of its class-aware property, CATRO is suitable to prune efficient networks adaptively for various classification subtasks, enhancing handy deployment and usage of deep networks in real-world applications.
arXiv Detail & Related papers (2021-10-21T06:26:31Z) - ZARTS: On Zero-order Optimization for Neural Architecture Search [94.41017048659664]
Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency.
This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation.
In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue.
arXiv Detail & Related papers (2021-10-10T09:35:15Z) - Millimeter Wave Communications with an Intelligent Reflector:
Performance Optimization and Distributional Reinforcement Learning [119.97450366894718]
A novel framework is proposed to optimize the downlink multi-user communication of a millimeter wave base station.
A channel estimation approach is developed to measure the channel state information (CSI) in real-time.
A distributional reinforcement learning (DRL) approach is proposed to learn the optimal IR reflection and maximize the expectation of downlink capacity.
arXiv Detail & Related papers (2020-02-24T22:18:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.