Does Form Follow Function? An Empirical Exploration of the Impact of
Deep Neural Network Architecture Design on Hardware-Specific Acceleration
- URL: http://arxiv.org/abs/2107.04144v1
- Date: Thu, 8 Jul 2021 23:05:39 GMT
- Title: Does Form Follow Function? An Empirical Exploration of the Impact of
Deep Neural Network Architecture Design on Hardware-Specific Acceleration
- Authors: Saad Abbasi, Mohammad Javad Shafiee, Ellick Chan, and Alexander Wong
- Abstract summary: This study investigates the impact of deep neural network architecture design on the degree of inference speedup.
We show that while leveraging hardware-specific acceleration achieved an average inference speed-up of 380%, the degree of inference speed-up varied drastically depending on the macro-architecture design pattern.
- Score: 76.35307867016336
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The fine-grained relationship between form and function with respect to deep
neural network architecture design and hardware-specific acceleration is one
area that is not well studied in the research literature, with form often
dictated by accuracy as opposed to hardware function. In this study, a
comprehensive empirical exploration is conducted to investigate the impact of
deep neural network architecture design on the degree of inference speedup that
can be achieved via hardware-specific acceleration. More specifically, we
empirically study the impact of a variety of commonly used macro-architecture
design patterns across different architectural depths through the lens of
OpenVINO microprocessor-specific and GPU-specific acceleration. Experimental
results showed that while leveraging hardware-specific acceleration achieved an
average inference speed-up of 380%, the degree of inference speed-up varied
drastically depending on the macro-architecture design pattern, with the
greatest speedup achieved on the depthwise bottleneck convolution design
pattern at 550%. Furthermore, we conduct an in-depth exploration of the
correlation between FLOPs requirement, level 3 cache efficacy, and network
latency with increasing architectural depth and width. Finally, we analyze the
inference time reductions using hardware-specific acceleration when compared to
native deep learning frameworks across a wide variety of hand-crafted deep
convolutional neural network architecture designs as well as ones found via
neural architecture search strategies. We found that the DARTS-derived
architecture to benefit from the greatest improvement from hardware-specific
software acceleration (1200%) while the depthwise bottleneck convolution-based
MobileNet-V2 to have the lowest overall inference time of around 2.4 ms.
Related papers
- EM-DARTS: Hierarchical Differentiable Architecture Search for Eye Movement Recognition [54.99121380536659]
Eye movement biometrics have received increasing attention thanks to its high secure identification.
Deep learning (DL) models have been recently successfully applied for eye movement recognition.
DL architecture still is determined by human prior knowledge.
We propose EM-DARTS, a hierarchical differentiable architecture search algorithm to automatically design the DL architecture for eye movement recognition.
arXiv Detail & Related papers (2024-09-22T13:11:08Z) - Multi-conditioned Graph Diffusion for Neural Architecture Search [8.290336491323796]
We present a graph diffusion-based NAS approach that uses discrete conditional graph diffusion processes to generate high-performing neural network architectures.
We show promising results on six standard benchmarks, yielding novel and unique architectures at a fast speed.
arXiv Detail & Related papers (2024-03-09T21:45:31Z) - Neural Architecture Codesign for Fast Bragg Peak Analysis [1.7081438846690533]
We develop an automated pipeline to streamline neural architecture codesign for fast, real-time Bragg peak analysis in microscopy.
Our method employs neural architecture search and AutoML to enhance these models, including hardware costs, leading to the discovery of more hardware-efficient neural architectures.
arXiv Detail & Related papers (2023-12-10T19:42:18Z) - Fast GraspNeXt: A Fast Self-Attention Neural Network Architecture for
Multi-task Learning in Computer Vision Tasks for Robotic Grasping on the Edge [80.88063189896718]
High architectural and computational complexity can result in poor suitability for deployment on embedded devices.
Fast GraspNeXt is a fast self-attention neural network architecture tailored for embedded multi-task learning in computer vision tasks for robotic grasping.
arXiv Detail & Related papers (2023-04-21T18:07:14Z) - Algorithm and Hardware Co-design for Reconfigurable CNN Accelerator [3.1431240233552007]
Recent advances in algorithm-hardware co-design for deep neural networks (DNNs) have demonstrated their potential in automatically designing neural architectures and hardware designs.
However, it is still a challenging optimization problem due to the expensive training cost and the time-consuming hardware implementation.
We propose a novel three-phase co-design framework, with the following new features.
Our found network and hardware configuration can achieve 2% 6% higher accuracy, 2x 26x smaller latency and 8.5x higher energy efficiency.
arXiv Detail & Related papers (2021-11-24T20:37:50Z) - ISyNet: Convolutional Neural Networks design for AI accelerator [0.0]
Current state-of-the-art architectures are found with neural architecture search (NAS) taking model complexity into account.
We propose a measure of hardware efficiency of neural architecture search space - matrix efficiency measure (MEM); a search space comprising of hardware-efficient operations; a latency-aware scaling method.
We show the advantage of the designed architectures for the NPU devices on ImageNet and the generalization ability for the downstream classification and detection tasks.
arXiv Detail & Related papers (2021-09-04T20:57:05Z) - Rethinking Co-design of Neural Architectures and Hardware Accelerators [31.342964958282092]
We systematically study the importance and strategies of co-designing neural architectures and hardware accelerators.
Our experiments show that the joint search method consistently outperforms previous platform-aware neural architecture search.
Our method can reduce energy consumption of an edge accelerator by up to 2x under the same accuracy constraint.
arXiv Detail & Related papers (2021-02-17T07:55:58Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - A Semi-Supervised Assessor of Neural Architectures [157.76189339451565]
We employ an auto-encoder to discover meaningful representations of neural architectures.
A graph convolutional neural network is introduced to predict the performance of architectures.
arXiv Detail & Related papers (2020-05-14T09:02:33Z) - Stage-Wise Neural Architecture Search [65.03109178056937]
Modern convolutional networks such as ResNet and NASNet have achieved state-of-the-art results in many computer vision applications.
These networks consist of stages, which are sets of layers that operate on representations in the same resolution.
It has been demonstrated that increasing the number of layers in each stage improves the prediction ability of the network.
However, the resulting architecture becomes computationally expensive in terms of floating point operations, memory requirements and inference time.
arXiv Detail & Related papers (2020-04-23T14:16:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.