Hierarchical Source-to-Post-Route QoR Prediction in High-Level Synthesis
with GNNs
- URL: http://arxiv.org/abs/2401.08696v1
- Date: Sun, 14 Jan 2024 07:24:08 GMT
- Title: Hierarchical Source-to-Post-Route QoR Prediction in High-Level Synthesis
with GNNs
- Authors: Mingzhe Gao, Jieru Zhao, Zhe Lin, Minyi Guo
- Abstract summary: We propose a hierarchical post-route QoR prediction approach for FPGA HLS.
By adopting our proposed methodology, the runtime for design space exploration in HLS is shortened to tens of minutes.
- Score: 25.920672727699984
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High-level synthesis (HLS) notably speeds up the hardware design process by
avoiding RTL programming. However, the turnaround time of HLS increases
significantly when post-route quality of results (QoR) are considered during
optimization. To tackle this issue, we propose a hierarchical post-route QoR
prediction approach for FPGA HLS, which features: (1) a modeling flow that
directly estimates latency and post-route resource usage from C/C++ programs;
(2) a graph construction method that effectively represents the control and
data flow graph of source code and effects of HLS pragmas; and (3) a
hierarchical GNN training and prediction method capable of capturing the impact
of loop hierarchies. Experimental results show that our method presents a
prediction error of less than 10% for different types of QoR metrics, which
gains tremendous improvement compared with the state-of-the-art GNN methods. By
adopting our proposed methodology, the runtime for design space exploration in
HLS is shortened to tens of minutes and the achieved ADRS is reduced to 6.91%
on average.
Related papers
- Geminet: Learning the Duality-based Iterative Process for Lightweight Traffic Engineering in Changing Topologies [53.38648279089736]
Geminet is a lightweight and scalable ML-based TE framework that can handle changing topologies.<n>Its neural network size is only 0.04% to 7% of existing schemes.<n>When trained on large-scale topologies, Geminet consumes under 10 GiB of memory, more than eight times less than the 80-plus GiB required by HARP.
arXiv Detail & Related papers (2025-06-30T09:09:50Z) - Intelligent4DSE: Optimizing High-Level Synthesis Design Space Exploration with Graph Neural Networks and Large Language Models [3.8429489584622156]
We propose CoGNNs-LLMEA, a framework that integrates a graph neural network with task-adaptive message passing and a large language model-enhanced evolutionary algorithm.
As a predictive model, CoGNNs directly leverages intermediate representations generated from source code after compiler front-end processing, enabling prediction of quality of results (QoR) without invoking HLS tools.
CoGNNs achieves state-of-the-art prediction accuracy in post-HLS QoR prediction, reducing mean prediction errors by 2.8$times$ for latency and 3.4$times$ for resource utilization compared to baseline models
arXiv Detail & Related papers (2025-04-28T10:08:56Z) - High-Performance and Scalable Fault-Tolerant Quantum Computation with Lattice Surgery on a 2.5D Architecture [0.5779598097190628]
We propose a high-performance and low-overhead FTQC architecture based on lattice surgery (LS) using surface code (SC)
The proposed Bypass architecture is a 2.5-dimensional architecture consisting of dense and sparse qubit layers.
The results show that the Bypass architecture improves the fidelity of FTQC and both a 1.73x speedup and a 17% reduction in classical/quantum hardware resources.
arXiv Detail & Related papers (2024-11-26T15:27:59Z) - Quantum Algorithm Exploration using Application-Oriented Performance
Benchmarks [0.0]
The QED-C suite of Application-Oriented Benchmarks provides the ability to gauge performance characteristics of quantum computers.
We investigate challenges in broadening the relevance of this benchmarking methodology to applications of greater complexity.
arXiv Detail & Related papers (2024-02-14T06:55:50Z) - Efficient Heterogeneous Graph Learning via Random Projection [58.4138636866903]
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs.
Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors.
We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN)
arXiv Detail & Related papers (2023-10-23T01:25:44Z) - ST-MLP: A Cascaded Spatio-Temporal Linear Framework with
Channel-Independence Strategy for Traffic Forecasting [47.74479442786052]
Current research on Spatio-Temporal Graph Neural Networks (STGNNs) often prioritizes complex designs, leading to computational burdens with only minor enhancements in accuracy.
We propose ST-MLP, a concise cascaded temporal-temporal model solely based on Multi-Layer Perceptron (MLP) modules and linear layers.
Empirical results demonstrate that ST-MLP outperforms state-of-the-art STGNNs and other models in terms of accuracy and computational efficiency.
arXiv Detail & Related papers (2023-08-14T23:34:59Z) - Algorithm and System Co-design for Efficient Subgraph-based Graph
Representation Learning [16.170895692951]
Subgraph-based graph representation learning (SGRL) has been recently proposed to deal with some fundamental challenges encountered by canonical graph neural networks (GNNs)
We propose a novel framework SUREL for scalable SGRL by co-designing the learning algorithm and its system support.
arXiv Detail & Related papers (2022-02-28T04:29:22Z) - High-Level Synthesis Performance Prediction using GNNs: Benchmarking,
Modeling, and Advancing [21.8349113634555]
Agile hardware development requires fast and accurate circuit quality evaluation from early design stages.
We propose a rapid and accurate performance modeling, exploiting the representation power of graph neural networks (GNNs) by representing C/C++ programs as graphs.
Our proposed predictor largely outperforms HLS by up to 40X and excels existing predictors by 2X to 5X in terms of resource usage and timing prediction.
arXiv Detail & Related papers (2022-01-18T09:53:48Z) - Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time
Mobile Acceleration [71.80326738527734]
We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations.
We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
arXiv Detail & Related papers (2021-11-22T23:53:14Z) - CATRO: Channel Pruning via Class-Aware Trace Ratio Optimization [61.71504948770445]
We propose a novel channel pruning method via Class-Aware Trace Ratio Optimization (CATRO) to reduce the computational burden and accelerate the model inference.
We show that CATRO achieves higher accuracy with similar cost or lower cost with similar accuracy than other state-of-the-art channel pruning algorithms.
Because of its class-aware property, CATRO is suitable to prune efficient networks adaptively for various classification subtasks, enhancing handy deployment and usage of deep networks in real-world applications.
arXiv Detail & Related papers (2021-10-21T06:26:31Z) - ZARTS: On Zero-order Optimization for Neural Architecture Search [94.41017048659664]
Differentiable architecture search (DARTS) has been a popular one-shot paradigm for NAS due to its high efficiency.
This work turns to zero-order optimization and proposes a novel NAS scheme, called ZARTS, to search without enforcing the above approximation.
In particular, results on 12 benchmarks verify the outstanding robustness of ZARTS, where the performance of DARTS collapses due to its known instability issue.
arXiv Detail & Related papers (2021-10-10T09:35:15Z) - Millimeter Wave Communications with an Intelligent Reflector:
Performance Optimization and Distributional Reinforcement Learning [119.97450366894718]
A novel framework is proposed to optimize the downlink multi-user communication of a millimeter wave base station.
A channel estimation approach is developed to measure the channel state information (CSI) in real-time.
A distributional reinforcement learning (DRL) approach is proposed to learn the optimal IR reflection and maximize the expectation of downlink capacity.
arXiv Detail & Related papers (2020-02-24T22:18:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.