On the Difficulty of Designing Processor Arrays for Deep Neural Networks
- URL: http://arxiv.org/abs/2006.14008v1
- Date: Wed, 24 Jun 2020 19:24:08 GMT
- Title: On the Difficulty of Designing Processor Arrays for Deep Neural Networks
- Authors: Kevin Stehle and G\"unther Schindler and Holger Fr\"oning
- Abstract summary: Camuy is a lightweight model of a weight-stationary systolic array for linear algebra operations.
We present an analysis of popular models to illustrate how it can estimate required cycles, data movement costs, as well as systolic array utilization.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Systolic arrays are a promising computing concept which is in particular
inline with CMOS technology trends and linear algebra operations found in the
processing of artificial neural networks. The recent success of such deep
learning methods in a wide set of applications has led to a variety of models,
which albeit conceptual similar as based on convolutions and fully-connected
layers, in detail show a huge diversity in operations due to a large design
space: An operand's dimension varies substantially since it depends on design
principles such as receptive field size, number of features, striding, dilating
and grouping of features. Last, recent networks extent previously plain
feedforward models by various connectivity, such as in ResNet or DenseNet. The
problem of choosing an optimal systolic array configuration cannot be solved
analytically, thus instead methods and tools are required that facilitate a
fast and accurate reasoning about optimality in terms of total cycles,
utilization, and amount of data movements. In this work we introduce Camuy, a
lightweight model of a weight-stationary systolic array for linear algebra
operations that allows quick explorations of different configurations, such as
systolic array dimensions and input/output bitwidths. Camuy aids accelerator
designers in either finding optimal configurations for a particular network
architecture or for robust performance across a variety of network
architectures. It offers simple integration into existing machine learning tool
stacks (e.g TensorFlow) through custom operators. We present an analysis of
popular DNN models to illustrate how it can estimate required cycles, data
movement costs, as well as systolic array utilization, and show how the
progress in network architecture design impacts the efficiency of inference on
accelerators based on systolic arrays.
Related papers
- Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge.
Existing methods struggle to balance high model performance with low resource consumption.
We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z) - Enhancing Convolutional Neural Networks with Higher-Order Numerical Difference Methods [6.26650196870495]
Convolutional Neural Networks (CNNs) have been able to assist humans in solving many real-world problems.
This paper proposes a stacking scheme based on the linear multi-step method to enhance the performance of CNNs.
arXiv Detail & Related papers (2024-09-08T05:13:58Z) - Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - Principled Architecture-aware Scaling of Hyperparameters [69.98414153320894]
Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process.
In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture.
We demonstrate that network rankings can be easily changed by better training networks in benchmarks.
arXiv Detail & Related papers (2024-02-27T11:52:49Z) - Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity.
Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z) - NAR-Former: Neural Architecture Representation Learning towards Holistic
Attributes Prediction [37.357949900603295]
We propose a neural architecture representation model that can be used to estimate attributes holistically.
Experiment results show that our proposed framework can be used to predict the latency and accuracy attributes of both cell architectures and whole deep neural networks.
arXiv Detail & Related papers (2022-11-15T10:15:21Z) - Analysis and Design of Quadratic Neural Networks for Regression,
Classification, and Lyapunov Control of Dynamical Systems [0.0]
This paper addresses the analysis and design of quadratic neural networks.
Networks offer several advantages, the most important of which are the fact that the architecture is a by-product of the design and is not determined a-priori.
Several examples will show the effectiveness of quadratic neural networks in applications.
arXiv Detail & Related papers (2022-07-26T18:10:05Z) - A Graph Deep Learning Framework for High-Level Synthesis Design Space
Exploration [11.154086943903696]
High-Level Synthesis is a solution for fast prototyping application-specific hardware.
We propose HLS, for the first time in the literature, graph neural networks that jointly predict acceleration performance and hardware costs.
We show that our approach achieves prediction accuracy comparable with that of commonly used simulators.
arXiv Detail & Related papers (2021-11-29T18:17:45Z) - Exploring Flip Flop memories and beyond: training recurrent neural
networks with key insights [0.0]
We study the implementation of a temporal processing task, specifically a 3-bit Flip Flop memory.
The obtained networks are meticulously analyzed to elucidate dynamics, aided by an array of visualization and analysis tools.
arXiv Detail & Related papers (2020-10-15T16:25:29Z) - Dynamic Graph: Learning Instance-aware Connectivity for Neural Networks [78.65792427542672]
Dynamic Graph Network (DG-Net) is a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent connection paths.
Instead of using the same path of the network, DG-Net aggregates features dynamically in each node, which allows the network to have more representation ability.
arXiv Detail & Related papers (2020-10-02T16:50:26Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.