Allo: A Programming Model for Composable Accelerator Design
- URL: http://arxiv.org/abs/2404.04815v1
- Date: Sun, 7 Apr 2024 05:47:54 GMT
- Title: Allo: A Programming Model for Composable Accelerator Design
- Authors: Hongzheng Chen, Niansong Zhang, Shaojie Xiang, Zhichen Zeng, Mengjia Dai, Zhiru Zhang,
- Abstract summary: We introduce Allo, a composable programming model for efficient spatial accelerator design.
Allo decouples hardware customizations, including compute, memory, communication, and data type from algorithm specification.
Our evaluation shows that Allo can outperform state-of-the-art HLS tools and ADLs on all test cases in the PolyBench.
- Score: 7.884541004161727
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Special-purpose hardware accelerators are increasingly pivotal for sustaining performance improvements in emerging applications, especially as the benefits of technology scaling continue to diminish. However, designers currently lack effective tools and methodologies to construct complex, high-performance accelerator architectures in a productive manner. Existing high-level synthesis (HLS) tools often require intrusive source-level changes to attain satisfactory quality of results. Despite the introduction of several new accelerator design languages (ADLs) aiming to enhance or replace HLS, their advantages are more evident in relatively simple applications with a single kernel. Existing ADLs prove less effective for realistic hierarchical designs with multiple kernels, even if the design hierarchy is flattened. In this paper, we introduce Allo, a composable programming model for efficient spatial accelerator design. Allo decouples hardware customizations, including compute, memory, communication, and data type from algorithm specification, and encapsulates them as a set of customization primitives. Allo preserves the hierarchical structure of an input program by combining customizations from different functions in a bottom-up, type-safe manner. This approach facilitates holistic optimizations that span across function boundaries. We conduct comprehensive experiments on commonly-used HLS benchmarks and several realistic deep learning models. Our evaluation shows that Allo can outperform state-of-the-art HLS tools and ADLs on all test cases in the PolyBench. For the GPT2 model, the inference latency of the Allo generated accelerator is 1.7x faster than the NVIDIA A100 GPU with 5.4x higher energy efficiency, demonstrating the capability of Allo to handle large-scale designs.
Related papers
- Inference Optimization of Foundation Models on AI Accelerators [68.24450520773688]
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI.
As the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios.
This tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators.
arXiv Detail & Related papers (2024-07-12T09:24:34Z) - Mechanistic Design and Scaling of Hybrid Architectures [114.3129802943915]
We identify and test new hybrid architectures constructed from a variety of computational primitives.
We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis.
We find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures.
arXiv Detail & Related papers (2024-03-26T16:33:12Z) - AutoHLS: Learning to Accelerate Design Space Exploration for HLS Designs [10.690389829735661]
This paper proposes a novel framework called AutoHLS, which integrates a deep neural network (DNN) with Bayesian optimization (BO) to accelerate HLS hardware design optimization.
Our experimental results demonstrate up to a 70-fold speedup in exploration time.
arXiv Detail & Related papers (2024-03-15T21:14:44Z) - Using the Abstract Computer Architecture Description Language to Model
AI Hardware Accelerators [77.89070422157178]
Manufacturers of AI-integrated products face a critical challenge: selecting an accelerator that aligns with their product's performance requirements.
The Abstract Computer Architecture Description Language (ACADL) is a concise formalization of computer architecture block diagrams.
In this paper, we demonstrate how to use the ACADL to model AI hardware accelerators, use their ACADL description to map DNNs onto them, and explain the timing simulation semantics to gather performance results.
arXiv Detail & Related papers (2024-01-30T19:27:16Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Chimera: A Hybrid Machine Learning Driven Multi-Objective Design Space
Exploration Tool for FPGA High-Level Synthesis [11.128278223431805]
High-level synthesis (HLS) tools were created to simplify hardware designs for FPGAs.
Applying these optimizations to achieve high performance is time-consuming and usually requires expert knowledge.
We present an automated design space exploration tool for applying HLS optimization directives, called Chimera.
arXiv Detail & Related papers (2022-07-03T21:13:55Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - AutoBERT-Zero: Evolving BERT Backbone from Scratch [94.89102524181986]
We propose an Operation-Priority Neural Architecture Search (OP-NAS) algorithm to automatically search for promising hybrid backbone architectures.
We optimize both the search algorithm and evaluation of candidate models to boost the efficiency of our proposed OP-NAS.
Experiments show that the searched architecture (named AutoBERT-Zero) significantly outperforms BERT and its variants of different model capacities in various downstream tasks.
arXiv Detail & Related papers (2021-07-15T16:46:01Z) - Scalable Deep-Learning-Accelerated Topology Optimization for Additively
Manufactured Materials [4.221095652322005]
Topology optimization (TO) is a popular and powerful computational approach for designing novel structures, materials, and devices.
To address these issues, we propose a general scalable deep-learning (DL) based TO framework, referred to as SDL-TO.
Our framework accelerates TO by learning the iterative history data and simultaneously training on the mapping between the given design and its gradient.
arXiv Detail & Related papers (2020-11-28T17:38:31Z) - Agile Autotuning of a Transprecision Tensor Accelerator Overlay for TVM
Compiler Stack [1.8337659614890698]
Specialized accelerators for tensor-operations, such as blocked-matrix operations and multi-dimensional convolutions, have been emerged as powerful architecture choices for Deep-Learning computing.
The rapid development of frameworks, models, and precision options challenges the adaptability of such tensor-accelerators.
programmable tensor accelerators offer a promising alternative by allowing reconfiguration of a virtual architecture that overlays on top of the physical FPGA fabric.
arXiv Detail & Related papers (2020-04-20T10:12:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.