Related papers: Learned Hardware/Software Co-Design of Neural Accelerators

Learned Hardware/Software Co-Design of Neural Accelerators

URL: http://arxiv.org/abs/2010.02075v1
Date: Mon, 5 Oct 2020 15:12:52 GMT
Title: Learned Hardware/Software Co-Design of Neural Accelerators
Authors: Zhan Shi, Chirag Sakhuja, Milad Hashemi, Kevin Swersky, Calvin Lin
Abstract summary: Deep learning software stacks and hardware accelerators are diverse and vast. Prior work considers software optimizations separately from hardware architectures, effectively reducing the search space. This paper casts the problem as hardware/software co-design, with the goal of automatically identifying desirable points in the joint design space.
Score: 20.929918108940093
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The use of deep learning has grown at an exponential rate, giving rise to numerous specialized hardware and software systems for deep learning. Because the design space of deep learning software stacks and hardware accelerators is diverse and vast, prior work considers software optimizations separately from hardware architectures, effectively reducing the search space. Unfortunately, this bifurcated approach means that many profitable design points are never explored. This paper instead casts the problem as hardware/software co-design, with the goal of automatically identifying desirable points in the joint design space. The key to our solution is a new constrained Bayesian optimization framework that avoids invalid solutions by exploiting the highly constrained features of this design space, which are semi-continuous/semi-discrete. We evaluate our optimization framework by applying it to a variety of neural models, improving the energy-delay product by 18% (ResNet) and 40% (DQN) over hand-tuned state-of-the-art systems, as well as demonstrating strong results on other neural network architectures, such as MLPs and Transformers.

Related papers

Hardware-aware Neural Architecture Search of Early Exiting Networks on Edge Accelerators [12.394874144369396]
Growing demand for embedded intelligence at the edge imposes stringent computational and energy constraints.<n>Early Exiting Neural Networks (EENN) have emerged as a promising solution.<n>We propose a hardware-aware Neural Architecture Search (NAS) framework to optimize the placement of early exit points within a network backbone.
arXiv Detail & Related papers (2025-12-04T11:54:09Z)
Comprehensive Design Space Exploration for Tensorized Neural Network Hardware Accelerators [11.97184801369339]
High-order tensor decomposition has been widely adopted to obtain compact deep neural networks for edge deployment.<n>Such hardware-unaware designs often obscure the potential latency and energy benefits of tensorized models.<n>We propose a co-exploration framework that unifies these dimensions within a unified design space for efficient training and inference of tensorized neural networks.
arXiv Detail & Related papers (2025-11-22T08:18:40Z)
Perturbation-efficient Zeroth-order Optimization for Hardware-friendly On-device Training [48.91359197313493]
Zeroth-order (ZO) optimization is an emerging deep neural network (DNN) training paradigm that offers computational simplicity and memory savings.<n>ZO requires generating a substantial number of Gaussian random numbers, which poses significant difficulties and even makes it infeasible for hardware platforms, such as FPGAs and ASICs.<n>We propose PeZO, a perturbation-efficient ZO framework that significantly reduces the demand for random number generation.<n>Our experiments show that PeZO reduces the required LUTs and FFs for random number generation by 48.6% and 12.7%, and saves at maximum 86% power consumption
arXiv Detail & Related papers (2025-04-28T23:58:07Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
Multi-objective Differentiable Neural Architecture Search [58.67218773054753]
We propose a novel NAS algorithm that encodes user preferences to trade-off performance and hardware metrics. Our method outperforms existing MOO NAS methods across a broad range of qualitatively different search spaces and datasets.
arXiv Detail & Related papers (2024-02-28T10:09:04Z)
Random resistive memory-based deep extreme point learning machine for unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM) Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z)
Biologically Plausible Learning on Neuromorphic Hardware Architectures [27.138481022472]
Neuromorphic computing is an emerging paradigm that confronts this imbalance by computations directly in analog memories. This work is the first to compare the impact of different learning algorithms on Compute-In-Memory-based hardware and vice versa.
arXiv Detail & Related papers (2022-12-29T15:10:59Z)
FreeREA: Training-Free Evolution-based Architecture Search [17.202375422110553]
FreeREA is a custom cell-based evolution NAS algorithm that exploits an optimised combination of training-free metrics to rank architectures. Our experiments, carried out on the common benchmarks NAS-Bench-101 and NATS-Bench, demonstrate that i) FreeREA is a fast, efficient, and effective search method for models automatic design.
arXiv Detail & Related papers (2022-06-17T11:16:28Z)
Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications [46.97774949613859]
Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI) However, their superior performance comes at the considerable cost of computational complexity. This paper provides an overview of efficient deep learning methods, systems and applications.
arXiv Detail & Related papers (2022-04-25T16:52:48Z)
A Semi-Decoupled Approach to Fast and Optimal Hardware-Software Co-Design of Neural Accelerators [22.69558355718029]
Hardware-software co-design has been emerging to fully reap the benefits of flexible design spaces and optimize neural network performance. Such co-design enlarges the total search space to practically infinity and presents substantial challenges. We propose a emphsemi-decoupled approach to reduce the size of the total design space by orders of magnitude, yet without losing optimality.
arXiv Detail & Related papers (2022-03-25T21:49:42Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
An Adaptive Device-Edge Co-Inference Framework Based on Soft Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices. We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations. Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z)
MS-RANAS: Multi-Scale Resource-Aware Neural Architecture Search [94.80212602202518]
We propose Multi-Scale Resource-Aware Neural Architecture Search (MS-RANAS) We employ a one-shot architecture search approach in order to obtain a reduced search cost. We achieve state-of-the-art results in terms of accuracy-speed trade-off.
arXiv Detail & Related papers (2020-09-29T11:56:01Z)
Hardware-Centric AutoML for Mixed-Precision Quantization [34.39845532939529]
Conventional quantization algorithm ignores the different hardware architectures and quantizes all the layers in a uniform way. In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework which leverages the reinforcement learning to automatically determine the quantization policy. Our framework effectively reduced the latency by 1.4-1.95x and the energy consumption by 1.9x with negligible loss of accuracy compared with the fixed bitwidth (8 bits) quantization.
arXiv Detail & Related papers (2020-08-11T17:30:22Z)
Automated Design Space Exploration for optimised Deployment of DNN on Arm Cortex-A CPUs [13.628734116014819]
Deep learning on embedded devices has prompted the development of numerous methods to optimise the deployment of deep neural networks (DNN) There is a lack of research on cross-level optimisation as the space of approaches becomes too large to test and obtain a globally optimised solution. We present a set of results for state-of-the-art DNNs on a range of Arm Cortex-A CPU platforms achieving up to 4x improvement in performance and over 2x reduction in memory.
arXiv Detail & Related papers (2020-06-09T11:00:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.