Chimera: A Hybrid Machine Learning Driven Multi-Objective Design Space
Exploration Tool for FPGA High-Level Synthesis
- URL: http://arxiv.org/abs/2207.07917v1
- Date: Sun, 3 Jul 2022 21:13:55 GMT
- Title: Chimera: A Hybrid Machine Learning Driven Multi-Objective Design Space
Exploration Tool for FPGA High-Level Synthesis
- Authors: Mang Yu, Sitao Huang and Deming Chen
- Abstract summary: High-level synthesis (HLS) tools were created to simplify hardware designs for FPGAs.
Applying these optimizations to achieve high performance is time-consuming and usually requires expert knowledge.
We present an automated design space exploration tool for applying HLS optimization directives, called Chimera.
- Score: 11.128278223431805
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In recent years, hardware accelerators based on field-programmable gate
arrays (FPGAs) have been widely adopted, thanks to FPGAs' extraordinary
flexibility. However, with the high flexibility comes the difficulty in design
and optimization. Conventionally, these accelerators are designed with
low-level hardware descriptive languages, which means creating large designs
with complex behavior is extremely difficult. Therefore, high-level synthesis
(HLS) tools were created to simplify hardware designs for FPGAs. They enable
the user to create hardware designs using high-level languages and provide
various optimization directives to help to improve the performance of the
synthesized hardware. However, applying these optimizations to achieve high
performance is time-consuming and usually requires expert knowledge. To address
this difficulty, we present an automated design space exploration tool for
applying HLS optimization directives, called Chimera, which significantly
reduces the human effort and expertise needed for creating high-performance HLS
designs. It utilizes a novel multi-objective exploration method that seamlessly
integrates active learning, evolutionary algorithm, and Thompson sampling,
making it capable of finding a set of optimized designs on a Pareto curve with
only a small number of design points evaluated during the exploration. In the
experiments, in less than 24 hours, this hybrid method explored design points
that have the same or superior performance compared to highly optimized
hand-tuned designs created by expert HLS users from the Rosetta benchmark
suite. In addition to discovering the extreme points, it also explores a Pareto
frontier, where the elbow point can potentially save up to 26\% of Flip-Flop
resource with negligibly higher latency.
Related papers
- AI-Driven Optimization of Hardware Overlay Configurations [0.0]
This paper presents an AI-driven approach to optimizing FPGA overlay configurations.
By leveraging machine learning techniques, we predict the feasibility and efficiency of different configurations before hardware compilation.
arXiv Detail & Related papers (2025-03-08T22:34:47Z) - ULTHO: Ultra-Lightweight yet Efficient Hyperparameter Optimization in Deep Reinforcement Learning [50.53705050673944]
We propose ULTHO, an ultra-lightweight yet powerful framework for fast HPO in deep RL within single runs.
Specifically, we formulate the HPO process as a multi-armed bandit with clustered arms (MABC) and link it directly to long-term return optimization.
We test ULTHO on benchmarks including ALE, Procgen, MiniGrid, and PyBullet.
arXiv Detail & Related papers (2025-03-08T07:03:43Z) - Quasar-ViT: Hardware-Oriented Quantization-Aware Architecture Search for Vision Transformers [56.37495946212932]
Vision transformers (ViTs) have demonstrated their superior accuracy for computer vision tasks compared to convolutional neural networks (CNNs)
This work proposes Quasar-ViT, a hardware-oriented quantization-aware architecture search framework for ViTs.
arXiv Detail & Related papers (2024-07-25T16:35:46Z) - Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers [58.5711048151424]
We introduce SPARSEK Attention, a novel sparse attention mechanism designed to overcome computational and memory obstacles.
Our approach integrates a scoring network and a differentiable top-k mask operator, SPARSEK, to select a constant number of KV pairs for each query.
Experimental results reveal that SPARSEK Attention outperforms previous sparse attention methods.
arXiv Detail & Related papers (2024-06-24T15:55:59Z) - Allo: A Programming Model for Composable Accelerator Design [7.884541004161727]
We introduce Allo, a composable programming model for efficient spatial accelerator design.
Allo decouples hardware customizations, including compute, memory, communication, and data type from algorithm specification.
Our evaluation shows that Allo can outperform state-of-the-art HLS tools and ADLs on all test cases in the PolyBench.
arXiv Detail & Related papers (2024-04-07T05:47:54Z) - Mechanistic Design and Scaling of Hybrid Architectures [114.3129802943915]
We identify and test new hybrid architectures constructed from a variety of computational primitives.
We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis.
We find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures.
arXiv Detail & Related papers (2024-03-26T16:33:12Z) - AutoHLS: Learning to Accelerate Design Space Exploration for HLS Designs [10.690389829735661]
This paper proposes a novel framework called AutoHLS, which integrates a deep neural network (DNN) with Bayesian optimization (BO) to accelerate HLS hardware design optimization.
Our experimental results demonstrate up to a 70-fold speedup in exploration time.
arXiv Detail & Related papers (2024-03-15T21:14:44Z) - Learning Performance-Improving Code Edits [107.21538852090208]
We introduce a framework for adapting large language models (LLMs) to high-level program optimization.
First, we curate a dataset of performance-improving edits made by human programmers of over 77,000 competitive C++ programming submission pairs.
For prompting, we propose retrieval-based few-shot prompting and chain-of-thought, and for finetuning, these include performance-conditioned generation and synthetic data augmentation based on self-play.
arXiv Detail & Related papers (2023-02-15T18:59:21Z) - Open-source FPGA-ML codesign for the MLPerf Tiny Benchmark [11.575901540758574]
We present our development experience for the Tiny Inference Benchmark on field-programmable gate array (FPGA) platforms.
We use the open-source hls4ml and FINN perJ, which aim to democratize AI- hardware codesign of optimized neural networks on FPGAs.
The solutions are deployed on system-on-chip (Pynq-Z2) and pure FPGA (Arty A7-100T) platforms.
arXiv Detail & Related papers (2022-06-23T15:57:17Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - VAQF: Fully Automatic Software-hardware Co-design Framework for Low-bit
Vision Transformer [121.85581713299918]
We propose VAQF, a framework that builds inference accelerators on FPGA platforms for quantized Vision Transformers (ViTs)
Given the model structure and the desired frame rate, VAQF will automatically output the required quantization precision for activations.
This is the first time quantization has been incorporated into ViT acceleration on FPGAs.
arXiv Detail & Related papers (2022-01-17T20:27:52Z) - A Graph Deep Learning Framework for High-Level Synthesis Design Space
Exploration [11.154086943903696]
High-Level Synthesis is a solution for fast prototyping application-specific hardware.
We propose HLS, for the first time in the literature, graph neural networks that jointly predict acceleration performance and hardware costs.
We show that our approach achieves prediction accuracy comparable with that of commonly used simulators.
arXiv Detail & Related papers (2021-11-29T18:17:45Z) - HALF: Holistic Auto Machine Learning for FPGAs [1.9146960682777232]
Deep Neural Networks (DNNs) are capable of solving complex problems in domains related to embedded systems, such as image and natural language processing.
To efficiently implement DNNs on a specific FPGA platform for a given cost criterion, e.g. energy efficiency, an enormous amount of design parameters has to be considered.
An automatic, holistic design approach can improve the quality of DNN implementations on FPGA significantly.
arXiv Detail & Related papers (2021-06-28T14:45:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.