Related papers: A Semi-Decoupled Approach to Fast and Optimal Hardware-Software Co-Design of Neural Accelerators

A Semi-Decoupled Approach to Fast and Optimal Hardware-Software Co-Design of Neural Accelerators

URL: http://arxiv.org/abs/2203.13921v1
Date: Fri, 25 Mar 2022 21:49:42 GMT
Title: A Semi-Decoupled Approach to Fast and Optimal Hardware-Software Co-Design of Neural Accelerators
Authors: Bingqian Lu, Zheyu Yan, Yiyu Shi, Shaolei Ren
Abstract summary: Hardware-software co-design has been emerging to fully reap the benefits of flexible design spaces and optimize neural network performance. Such co-design enlarges the total search space to practically infinity and presents substantial challenges. We propose a emphsemi-decoupled approach to reduce the size of the total design space by orders of magnitude, yet without losing optimality.
Score: 22.69558355718029
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In view of the performance limitations of fully-decoupled designs for neural architectures and accelerators, hardware-software co-design has been emerging to fully reap the benefits of flexible design spaces and optimize neural network performance. Nonetheless, such co-design also enlarges the total search space to practically infinity and presents substantial challenges. While the prior studies have been focusing on improving the search efficiency (e.g., via reinforcement learning), they commonly rely on co-searches over the entire architecture-accelerator design space. In this paper, we propose a \emph{semi}-decoupled approach to reduce the size of the total design space by orders of magnitude, yet without losing optimality. We first perform neural architecture search to obtain a small set of optimal architectures for one accelerator candidate. Importantly, this is also the set of (close-to-)optimal architectures for other accelerator designs based on the property that neural architectures' ranking orders in terms of inference latency and energy consumption on different accelerator designs are highly similar. Then, instead of considering all the possible architectures, we optimize the accelerator design only in combination with this small set of architectures, thus significantly reducing the total search cost. We validate our approach by conducting experiments on various architecture spaces for accelerator designs with different dataflows. Our results highlight that we can obtain the optimal design by only navigating over the reduced search space. The source code of this work is at \url{https://github.com/Ren-Research/CoDesign}.

Related papers

Neural Architecture Codesign for Fast Physics Applications [0.8692847090818803]
We develop a pipeline to streamline neural architecture codesign for physics applications. We employ neural architecture search and network compression in a two-stage approach to discover hardware efficient models.
arXiv Detail & Related papers (2025-01-09T19:00:03Z)
Neural Architecture Codesign for Fast Bragg Peak Analysis [1.7081438846690533]
We develop an automated pipeline to streamline neural architecture codesign for fast, real-time Bragg peak analysis in microscopy. Our method employs neural architecture search and AutoML to enhance these models, including hardware costs, leading to the discovery of more hardware-efficient neural architectures.
arXiv Detail & Related papers (2023-12-10T19:42:18Z)
Pruning-as-Search: Efficient Neural Architecture Search via Channel Pruning and Structural Reparameterization [50.50023451369742]
Pruning-as-Search (PaS) is an end-to-end channel pruning method to search out desired sub-network automatically and efficiently. Our proposed architecture outperforms prior arts by around $1.0%$ top-1 accuracy on ImageNet-1000 classification task.
arXiv Detail & Related papers (2022-06-02T17:58:54Z)
Does Form Follow Function? An Empirical Exploration of the Impact of Deep Neural Network Architecture Design on Hardware-Specific Acceleration [76.35307867016336]
This study investigates the impact of deep neural network architecture design on the degree of inference speedup. We show that while leveraging hardware-specific acceleration achieved an average inference speed-up of 380%, the degree of inference speed-up varied drastically depending on the macro-architecture design pattern.
arXiv Detail & Related papers (2021-07-08T23:05:39Z)
A Construction Kit for Efficient Low Power Neural Network Accelerator Designs [11.807678100385164]
This work provides a survey of neural network accelerator optimization approaches that have been used in recent works. It presents the list of optimizations and their quantitative effects as a construction kit, allowing to assess the design choices for each building block separately.
arXiv Detail & Related papers (2021-06-24T07:53:56Z)
iDARTS: Differentiable Architecture Search with Stochastic Implicit Gradients [75.41173109807735]
Differentiable ARchiTecture Search (DARTS) has recently become the mainstream of neural architecture search (NAS) We tackle the hypergradient computation in DARTS based on the implicit function theorem. We show that the architecture optimisation with the proposed method, named iDARTS, is expected to converge to a stationary point.
arXiv Detail & Related papers (2021-06-21T00:44:11Z)
NAAS: Neural Accelerator Architecture Search [16.934625310654553]
We propose Neural Accelerator Architecture Search (NAAS) to holistically search the neural network architecture, accelerator architecture, and compiler mappings. As a data-driven approach, NAAS rivals the human design Eyeriss by 4.4x EDP reduction with 2.7% accuracy improvement on ImageNet.
arXiv Detail & Related papers (2021-05-27T15:56:41Z)
AutoSpace: Neural Architecture Search with Less Human Interference [84.42680793945007]
Current neural architecture search (NAS) algorithms still require expert knowledge and effort to design a search space for network construction. We propose a novel differentiable evolutionary framework named AutoSpace, which evolves the search space to an optimal one. With the learned search space, the performance of recent NAS algorithms can be improved significantly compared with using previously manually designed spaces.
arXiv Detail & Related papers (2021-03-22T13:28:56Z)
Apollo: Transferable Architecture Exploration [26.489275442359464]
We propose a transferable architecture exploration framework, dubbed Apollo. We show that our framework finds high reward design configurations more sample-efficiently than a baseline black-box optimization approach.
arXiv Detail & Related papers (2021-02-02T19:36:02Z)
Evolving Search Space for Neural Architecture Search [70.71153433676024]
We present a Neural Search-space Evolution (NSE) scheme that amplifies the results from the previous effort by maintaining an optimized search space subset. We achieve 77.3% top-1 retrain accuracy on ImageNet with 333M FLOPs, which yielded a state-of-the-art performance. When the latency constraint is adopted, our result also performs better than the previous best-performing mobile models with a 77.9% Top-1 retrain accuracy.
arXiv Detail & Related papers (2020-11-22T01:11:19Z)
Learned Hardware/Software Co-Design of Neural Accelerators [20.929918108940093]
Deep learning software stacks and hardware accelerators are diverse and vast. Prior work considers software optimizations separately from hardware architectures, effectively reducing the search space. This paper casts the problem as hardware/software co-design, with the goal of automatically identifying desirable points in the joint design space.
arXiv Detail & Related papers (2020-10-05T15:12:52Z)
Stage-Wise Neural Architecture Search [65.03109178056937]
Modern convolutional networks such as ResNet and NASNet have achieved state-of-the-art results in many computer vision applications. These networks consist of stages, which are sets of layers that operate on representations in the same resolution. It has been demonstrated that increasing the number of layers in each stage improves the prediction ability of the network. However, the resulting architecture becomes computationally expensive in terms of floating point operations, memory requirements and inference time.
arXiv Detail & Related papers (2020-04-23T14:16:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.