Self-Adaptive Reconfigurable Arrays (SARA): Using ML to Assist Scaling
GEMM Acceleration
- URL: http://arxiv.org/abs/2101.04799v1
- Date: Tue, 12 Jan 2021 23:20:23 GMT
- Title: Self-Adaptive Reconfigurable Arrays (SARA): Using ML to Assist Scaling
GEMM Acceleration
- Authors: Ananda Samajdar, Michael Pellauer, Tushar Krishna
- Abstract summary: This work introduces a new class of accelerators that we call Self Adaptive Reconfigurable Array (SARA)
SARA is capable of providing the same mapping flexibility as a collection of 10244x4 arrays working as a distributed system while achieving 3.5x more power efficiency and 3.2x higher compute density.
We develop a novel recommendation neural network called ADAPTNET which recommends an array configuration and dataflow for the current layer parameters.
- Score: 3.2218154783263833
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With increasing diversity in Deep Neural Network(DNN) models in terms of
layer shapes and sizes, the research community has been investigating
flexible/reconfigurable accelerator substrates. This line of research has
opened up two challenges. The first is to determine the appropriate amount of
flexibility within an accelerator array that that can trade-off the performance
benefits versus the area overheads of the reconfigurability. The second is
being able to determine the right configuration of the array for the current
DNN model and/or layer and reconfigure the accelerator at runtime. This work
introduces a new class of accelerators that we call Self Adaptive
Reconfigurable Array (SARA). SARA architectures comprise of both a
reconfigurable array and a hardware unit capable of determining an optimized
configuration for the array at runtime. We demonstrate an instance of SARA with
an accelerator we call SAGAR, which introduces a novel reconfigurable systolic
array that can be configured to work as a distributed collection of smaller
arrays of various sizes or as a single array with flexible aspect ratios. We
also develop a novel recommendation neural network called ADAPTNET which
recommends an array configuration and dataflow for the current layer
parameters. ADAPTNET runs on an integrated custom hardware ADAPTNETX that runs
ADAPTNET at runtime and reconfigures the array, making the entire accelerator
self-sufficient. SAGAR is capable of providing the same mapping flexibility as
a collection of 10244x4 arrays working as a distributed system while achieving
3.5x more power efficiency and 3.2x higher compute density Furthermore, the
runtime achieved on the recommended parameters from ADAPTNET is 99.93% of the
best achievable runtime.
Related papers
- A-SDM: Accelerating Stable Diffusion through Model Assembly and Feature Inheritance Strategies [51.7643024367548]
Stable Diffusion Model is a prevalent and effective model for text-to-image (T2I) and image-to-image (I2I) generation.
This study focuses on reducing redundant computation in SDM and optimizing the model through both tuning and tuning-free methods.
arXiv Detail & Related papers (2024-05-31T21:47:05Z) - LoRA-Switch: Boosting the Efficiency of Dynamic LLM Adapters via System-Algorithm Co-design [23.874726096958135]
We analyze the fine-grained costs of the dynamic adapters and find that the fragmented kernel calls are the root cause.
Unlike most existing dynamic structures that adopt layer-wise or block-wise dynamic routing, LoRA-Switch introduces a token-wise routing mechanism.
For efficiency, this switching is implemented with an optimized kernel, which fuses the operations for all LoRA adapters at once.
arXiv Detail & Related papers (2024-05-28T01:53:26Z) - RAMAN: A Re-configurable and Sparse tinyML Accelerator for Inference on
Edge [1.8293684411977293]
Deep Neural Network (DNN) based inference at the edge is challenging as these compute and data-intensive algorithms need to be implemented at low cost and low power.
We present RAMAN, a Re-configurable and spArse tinyML Accelerator for infereNce on edge, architected to exploit the sparsity to reduce area (storage), power as well as latency.
arXiv Detail & Related papers (2023-06-10T17:25:58Z) - Reconfigurable Distributed FPGA Cluster Design for Deep Learning
Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications.
The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z) - HKNAS: Classification of Hyperspectral Imagery Based on Hyper Kernel
Neural Architecture Search [104.45426861115972]
We propose to directly generate structural parameters by utilizing the specifically designed hyper kernels.
We obtain three kinds of networks to separately conduct pixel-level or image-level classifications with 1-D or 3-D convolutions.
A series of experiments on six public datasets demonstrate that the proposed methods achieve state-of-the-art results.
arXiv Detail & Related papers (2023-04-23T17:27:40Z) - Bifrost: End-to-End Evaluation and Optimization of Reconfigurable DNN
Accelerators [0.0]
Bifrost is an end-to-end framework for the evaluation and optimization of reconfigurable inference accelerators.
We discuss Bifrost's advantages over STONNE and other tools, and evaluate the MAERI and SIGMA architectures using Bifrost.
arXiv Detail & Related papers (2022-04-26T16:22:24Z) - Scale-out Systolic Arrays [37.398797072460034]
We study three key pillars in multi-pod systolic array designs, namely array granularity, interconnect, and tiling.
We identify optimal array granularity across workloads and show that state-of-the-art commercial accelerators use suboptimal array sizes for single-tenancy workloads.
We propose Scale-out Sy Arrays, a multi-pod inference accelerator for both single- and multi-tenancy.
arXiv Detail & Related papers (2022-03-22T08:46:11Z) - Trilevel Neural Architecture Search for Efficient Single Image
Super-Resolution [127.92235484598811]
This paper proposes a trilevel neural architecture search (NAS) method for efficient single image super-resolution (SR)
For modeling the discrete search space, we apply a new continuous relaxation on the discrete search spaces to build a hierarchical mixture of network-path, cell-operations, and kernel-width.
An efficient search algorithm is proposed to perform optimization in a hierarchical supernet manner.
arXiv Detail & Related papers (2021-01-17T12:19:49Z) - VEGA: Towards an End-to-End Configurable AutoML Pipeline [101.07003005736719]
VEGA is an efficient and comprehensive AutoML framework that is compatible and optimized for multiple hardware platforms.
VEGA can improve the existing AutoML algorithms and discover new high-performance models against SOTA methods.
arXiv Detail & Related papers (2020-11-03T06:53:53Z) - FlexSA: Flexible Systolic Array Architecture for Efficient Pruned DNN
Model Training [1.718730454558804]
We find that pruning a model using a common training accelerator with large systolic arrays is extremely performance-inefficient.
To make a systolic array efficient for pruning and training, we propose FlexSA, a flexible systolic array architecture.
We also present a compilation for tiling matrix-multiplication-and-accumulation operations in a training workload to best utilize the resources of FlexSA.
arXiv Detail & Related papers (2020-04-27T15:51:20Z) - DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution
Pruning [135.27931587381596]
We propose an efficient and unified NAS framework termed DDPNAS via dynamic distribution pruning.
In particular, we first sample architectures from a joint categorical distribution. Then the search space is dynamically pruned and its distribution is updated every few epochs.
With the proposed efficient network generation method, we directly obtain the optimal neural architectures on given constraints.
arXiv Detail & Related papers (2019-05-28T06:35:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.