A Graph Deep Learning Framework for High-Level Synthesis Design Space
Exploration
- URL: http://arxiv.org/abs/2111.14767v1
- Date: Mon, 29 Nov 2021 18:17:45 GMT
- Title: A Graph Deep Learning Framework for High-Level Synthesis Design Space
Exploration
- Authors: Lorenzo Ferretti, Andrea Cini, Georgios Zacharopoulos, Cesare Alippi,
Laura Pozzi
- Abstract summary: High-Level Synthesis is a solution for fast prototyping application-specific hardware.
We propose HLS, for the first time in the literature, graph neural networks that jointly predict acceleration performance and hardware costs.
We show that our approach achieves prediction accuracy comparable with that of commonly used simulators.
- Score: 11.154086943903696
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The design of efficient hardware accelerators for high-throughput
data-processing applications, e.g., deep neural networks, is a challenging task
in computer architecture design. In this regard, High-Level Synthesis (HLS)
emerges as a solution for fast prototyping application-specific hardware
starting from a behavioural description of the application computational flow.
This Design-Space Exploration (DSE) aims at identifying Pareto optimal
synthesis configurations whose exhaustive search is often unfeasible due to the
design-space dimensionality and the prohibitive computational cost of the
synthesis process. Within this framework, we effectively and efficiently
address the design problem by proposing, for the first time in the literature,
graph neural networks that jointly predict acceleration performance and
hardware costs of a synthesized behavioral specification given optimization
directives. The learned model can be used to rapidly approach the Pareto curve
by guiding the DSE, taking into account performance and cost estimates. The
proposed method outperforms traditional HLS-driven DSE approaches, by
accounting for arbitrary length of computer programs and the invariant
properties of the input. We propose a novel hybrid control and data flow graph
representation that enables training the graph neural network on specifications
of different hardware accelerators; the methodology naturally transfers to
unseen data-processing applications too. Moreover, we show that our approach
achieves prediction accuracy comparable with that of commonly used simulators
without having access to analytical models of the HLS compiler and the target
FPGA, while being orders of magnitude faster. Finally, the learned
representation can be exploited for DSE in unexplored configuration spaces by
fine-tuning on a small number of samples from the new target domain.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Mechanistic Design and Scaling of Hybrid Architectures [114.3129802943915]
We identify and test new hybrid architectures constructed from a variety of computational primitives.
We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis.
We find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures.
arXiv Detail & Related papers (2024-03-26T16:33:12Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - h-analysis and data-parallel physics-informed neural networks [0.7614628596146599]
We explore the data-parallel acceleration of machine learning schemes with a focus on physics-informed neural networks (PINNs)
We detail a novel protocol based on $h$-analysis and data-parallel acceleration through the Horovod training framework.
We show that the acceleration is straightforward to implement, does not compromise training, and proves to be highly efficient and controllable.
arXiv Detail & Related papers (2023-02-17T12:15:18Z) - Towards Theoretically Inspired Neural Initialization Optimization [66.04735385415427]
We propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network.
We show that both the training and test performance of a network can be improved by maximizing GradCosine under norm constraint.
Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost.
arXiv Detail & Related papers (2022-10-12T06:49:16Z) - FreeREA: Training-Free Evolution-based Architecture Search [17.202375422110553]
FreeREA is a custom cell-based evolution NAS algorithm that exploits an optimised combination of training-free metrics to rank architectures.
Our experiments, carried out on the common benchmarks NAS-Bench-101 and NATS-Bench, demonstrate that i) FreeREA is a fast, efficient, and effective search method for models automatic design.
arXiv Detail & Related papers (2022-06-17T11:16:28Z) - Towards Optimal VPU Compiler Cost Modeling by using Neural Networks to
Infer Hardware Performances [58.720142291102135]
'VPUNN' is a neural network-based cost model trained on low-level task profiling.
It consistently outperforms the state-of-the-art cost modeling in Intel's line of VPU processors.
arXiv Detail & Related papers (2022-05-09T22:48:39Z) - Hybrid Graph Models for Logic Optimization via Spatio-Temporal
Information [15.850413267830522]
Two major concerns that may impede production-ready ML applications in EDA are accuracy requirements and generalization capability.
We propose hybrid graph neural network (GNN) based approaches towards highly accurate quality-of-result (QoR) estimations.
Evaluation on 3.3 million data points shows that the testing mean absolute percentage error (MAPE) on designs seen unseen during training are no more than 1.2% and 3.1%.
arXiv Detail & Related papers (2022-01-20T21:12:22Z) - Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge
Computing: A Contextual-Bandit Approach [81.5261621619557]
We propose an adaptive anomaly detection scheme with hierarchical edge computing (HEC)
We first construct multiple anomaly detection DNN models with increasing complexity, and associate each of them to a corresponding HEC layer.
Then, we design an adaptive model selection scheme that is formulated as a contextual-bandit problem and solved by using a reinforcement learning policy network.
arXiv Detail & Related papers (2021-08-09T08:45:47Z) - On the Difficulty of Designing Processor Arrays for Deep Neural Networks [0.0]
Camuy is a lightweight model of a weight-stationary systolic array for linear algebra operations.
We present an analysis of popular models to illustrate how it can estimate required cycles, data movement costs, as well as systolic array utilization.
arXiv Detail & Related papers (2020-06-24T19:24:08Z) - GraphACT: Accelerating GCN Training on CPU-FPGA Heterogeneous Platforms [1.2183405753834562]
Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art deep learning model for representation learning on graphs.
It is challenging to accelerate training of GCNs due to substantial and irregular data communication.
We design a novel accelerator for training GCNs on CPU-FPGA heterogeneous systems.
arXiv Detail & Related papers (2019-12-31T21:19:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.