Performance Analysis of DNN Inference/Training with Convolution and
non-Convolution Operations
- URL: http://arxiv.org/abs/2306.16767v1
- Date: Thu, 29 Jun 2023 08:11:36 GMT
- Title: Performance Analysis of DNN Inference/Training with Convolution and
non-Convolution Operations
- Authors: Hadi Esmaeilzadeh, Soroush Ghodrati, Andrew B. Kahng, Sean Kinzer,
Susmita Dey Manasi, Sachin S. Sapatnekar, and Zhiang Wang
- Abstract summary: This work proposes a novel performance analysis framework, SimDIT, for general ASIC-based systolic hardware accelerator platforms.
SimDIT comprehensively covers convolution and non-convolution operations of both CNN inference and training.
SimDIT achieves 18X performance improvement over a generic static resource allocation for ResNet-50 inference.
- Score: 5.647410731290209
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Today's performance analysis frameworks for deep learning accelerators suffer
from two significant limitations. First, although modern convolutional neural
network (CNNs) consist of many types of layers other than convolution,
especially during training, these frameworks largely focus on convolution
layers only. Second, these frameworks are generally targeted towards inference,
and lack support for training operations. This work proposes a novel
performance analysis framework, SimDIT, for general ASIC-based systolic
hardware accelerator platforms. The modeling effort of SimDIT comprehensively
covers convolution and non-convolution operations of both CNN inference and
training on a highly parameterizable hardware substrate. SimDIT is integrated
with a backend silicon implementation flow and provides detailed end-to-end
performance statistics (i.e., data access cost, cycle counts, energy, and
power) for executing CNN inference and training workloads. SimDIT-enabled
performance analysis reveals that on a 64X64 processing array, non-convolution
operations constitute 59.5% of total runtime for ResNet-50 training workload.
In addition, by optimally distributing available off-chip DRAM bandwidth and
on-chip SRAM resources, SimDIT achieves 18X performance improvement over a
generic static resource allocation for ResNet-50 inference.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs.
We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Real-time Hyper-Dimensional Reconfiguration at the Edge using Hardware
Accelerators [12.599871451119538]
HyDRATE can perform real-time reconfiguration at the edge using deep neural nets (DNN) combined with hyperdimensional (HD) computing accelerators.
We describe the algorithm, trained quantized model generation, and simulated performance of a feature extractor free of multiply-accumulates.
We show that reconfigurability in the field is achieved by retraining only the feed-forward HD classifier without descent gradient backpropagation.
arXiv Detail & Related papers (2022-06-10T14:08:41Z) - dPRO: A Generic Profiling and Optimization System for Expediting
Distributed DNN Training [12.413533491501548]
This paper proposes dPRO, a tool to identify performance bottlenecks in distributed training systems.
We implement dPRO on multiple deep learning frameworks (PyTorch, MXNet, AllReduce and Server architecture) and representative communication schemes.
Extensive experiments show that dPRO predicts performance of distributed training in various settings with5% errors in most cases and finds optimization strategies with up to87.1%-up over the baselines.
arXiv Detail & Related papers (2022-05-05T07:15:25Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks.
specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples.
We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z) - GhostSR: Learning Ghost Features for Efficient Image Super-Resolution [49.393251361038025]
Single image super-resolution (SISR) system based on convolutional neural networks (CNNs) achieves fancy performance while requires huge computational costs.
We propose to use shift operation to generate the redundant features (i.e., Ghost features) of SISR models.
We show that both the non-compact and lightweight SISR models embedded in our proposed module can achieve comparable performance to that of their baselines.
arXiv Detail & Related papers (2021-01-21T10:09:47Z) - Fully-parallel Convolutional Neural Network Hardware [0.7829352305480285]
We propose a new power-and-area-efficient architecture for implementing Articial Neural Networks (ANNs) in hardware.
For the first time, a fully-parallel CNN as LENET-5 is embedded and tested in a single FPGA.
arXiv Detail & Related papers (2020-06-22T17:19:09Z) - TxSim:Modeling Training of Deep Neural Networks on Resistive Crossbar
Systems [3.1887081453726136]
crossbar-based computations face a major challenge due to a variety of device and circuit-level non-idealities.
We propose TxSim, a fast and customizable modeling framework to functionally evaluate DNN training on crossbar-based hardware.
arXiv Detail & Related papers (2020-02-25T19:29:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.