Learned Hardware/Software Co-Design of Neural Accelerators
- URL: http://arxiv.org/abs/2010.02075v1
- Date: Mon, 5 Oct 2020 15:12:52 GMT
- Title: Learned Hardware/Software Co-Design of Neural Accelerators
- Authors: Zhan Shi, Chirag Sakhuja, Milad Hashemi, Kevin Swersky, Calvin Lin
- Abstract summary: Deep learning software stacks and hardware accelerators are diverse and vast.
Prior work considers software optimizations separately from hardware architectures, effectively reducing the search space.
This paper casts the problem as hardware/software co-design, with the goal of automatically identifying desirable points in the joint design space.
- Score: 20.929918108940093
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The use of deep learning has grown at an exponential rate, giving rise to
numerous specialized hardware and software systems for deep learning. Because
the design space of deep learning software stacks and hardware accelerators is
diverse and vast, prior work considers software optimizations separately from
hardware architectures, effectively reducing the search space. Unfortunately,
this bifurcated approach means that many profitable design points are never
explored. This paper instead casts the problem as hardware/software co-design,
with the goal of automatically identifying desirable points in the joint design
space. The key to our solution is a new constrained Bayesian optimization
framework that avoids invalid solutions by exploiting the highly constrained
features of this design space, which are semi-continuous/semi-discrete. We
evaluate our optimization framework by applying it to a variety of neural
models, improving the energy-delay product by 18% (ResNet) and 40% (DQN) over
hand-tuned state-of-the-art systems, as well as demonstrating strong results on
other neural network architectures, such as MLPs and Transformers.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Random resistive memory-based deep extreme point learning machine for
unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM)
Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z) - Biologically Plausible Learning on Neuromorphic Hardware Architectures [27.138481022472]
Neuromorphic computing is an emerging paradigm that confronts this imbalance by computations directly in analog memories.
This work is the first to compare the impact of different learning algorithms on Compute-In-Memory-based hardware and vice versa.
arXiv Detail & Related papers (2022-12-29T15:10:59Z) - FreeREA: Training-Free Evolution-based Architecture Search [17.202375422110553]
FreeREA is a custom cell-based evolution NAS algorithm that exploits an optimised combination of training-free metrics to rank architectures.
Our experiments, carried out on the common benchmarks NAS-Bench-101 and NATS-Bench, demonstrate that i) FreeREA is a fast, efficient, and effective search method for models automatic design.
arXiv Detail & Related papers (2022-06-17T11:16:28Z) - Enable Deep Learning on Mobile Devices: Methods, Systems, and
Applications [46.97774949613859]
Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI)
However, their superior performance comes at the considerable cost of computational complexity.
This paper provides an overview of efficient deep learning methods, systems and applications.
arXiv Detail & Related papers (2022-04-25T16:52:48Z) - A Semi-Decoupled Approach to Fast and Optimal Hardware-Software
Co-Design of Neural Accelerators [22.69558355718029]
Hardware-software co-design has been emerging to fully reap the benefits of flexible design spaces and optimize neural network performance.
Such co-design enlarges the total search space to practically infinity and presents substantial challenges.
We propose a emphsemi-decoupled approach to reduce the size of the total design space by orders of magnitude, yet without losing optimality.
arXiv Detail & Related papers (2022-03-25T21:49:42Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS)
We employ a one-shot architecture search approach in order to obtain a reduced search cost.
We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z) - Hardware-Centric AutoML for Mixed-Precision Quantization [34.39845532939529]
Conventional quantization algorithm ignores the different hardware architectures and quantizes all the layers in a uniform way.
In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework which leverages the reinforcement learning to automatically determine the quantization policy.
Our framework effectively reduced the latency by 1.4-1.95x and the energy consumption by 1.9x with negligible loss of accuracy compared with the fixed bitwidth (8 bits) quantization.
arXiv Detail & Related papers (2020-08-11T17:30:22Z) - Automated Design Space Exploration for optimised Deployment of DNN on
Arm Cortex-A CPUs [13.628734116014819]
Deep learning on embedded devices has prompted the development of numerous methods to optimise the deployment of deep neural networks (DNN)
There is a lack of research on cross-level optimisation as the space of approaches becomes too large to test and obtain a globally optimised solution.
We present a set of results for state-of-the-art DNNs on a range of Arm Cortex-A CPU platforms achieving up to 4x improvement in performance and over 2x reduction in memory.
arXiv Detail & Related papers (2020-06-09T11:00:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.