Related papers: Near-Optimal Hardware Design for Convolutional Neural Networks

Near-Optimal Hardware Design for Convolutional Neural Networks

URL: http://arxiv.org/abs/2002.05526v1
Date: Thu, 6 Feb 2020 09:15:03 GMT
Title: Near-Optimal Hardware Design for Convolutional Neural Networks
Authors: Byungik Ahn
Abstract summary: This study proposes a novel, special-purpose, and high-efficiency hardware architecture for convolutional neural networks. The proposed architecture maximizes the utilization of multipliers by designing the computational circuit with the same structure as that of the computational flow of the model. An implementation based on the proposed hardware architecture has been applied in commercial AI products.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, the demand of low-power deep-learning hardware for industrial applications has been increasing. Most existing artificial intelligence (AI) chips have evolved to rely on new chip technologies rather than on radically new hardware architectures, to maintain their generality. This study proposes a novel, special-purpose, and high-efficiency hardware architecture for convolutional neural networks. The proposed architecture maximizes the utilization of multipliers by designing the computational circuit with the same structure as that of the computational flow of the model, rather than mapping computations to fixed hardware. In addition, a specially designed filter circuit simultaneously provides all the data of the receptive field, using only one memory read operation during each clock cycle; this allows the computation circuit to operate seamlessly without idle cycles. Our reference system based on the proposed architecture uses 97% of the peak-multiplication capability in actual computations required by the computation model throughout the computation period. In addition, overhead components are minimized so that the proportion of the resources constituting the non-multiplier components is smaller than that constituting the multiplier components, which are indispensable for the computational model. The efficiency of the proposed architecture is close to an ideally efficient system that cannot be improved further in terms of the performance-to-resource ratio. An implementation based on the proposed hardware architecture has been applied in commercial AI products.

Related papers

AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation [48.82264764771652]
We introduce AsCAN -- a hybrid architecture, combining both convolutional and transformer blocks. AsCAN supports a variety of tasks: recognition, segmentation, class-conditional image generation. We then scale the same architecture to solve a large-scale text-to-image task and show state-of-the-art performance.
arXiv Detail & Related papers (2024-11-07T18:43:17Z)
Co-design of a novel CMOS highly parallel, low-power, multi-chip neural network accelerator [0.0]
We present the NV-1, a new low-power ASIC AI processor that greatly accelerates parallel processing (> 10X) with dramatic reduction in energy consumption. The resulting device is currently being used in a fielded edge sensor application.
arXiv Detail & Related papers (2024-09-28T15:47:16Z)
Mechanistic Design and Scaling of Hybrid Architectures [114.3129802943915]
We identify and test new hybrid architectures constructed from a variety of computational primitives. We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis. We find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures.
arXiv Detail & Related papers (2024-03-26T16:33:12Z)
Using the Abstract Computer Architecture Description Language to Model AI Hardware Accelerators [77.89070422157178]
Manufacturers of AI-integrated products face a critical challenge: selecting an accelerator that aligns with their product's performance requirements. The Abstract Computer Architecture Description Language (ACADL) is a concise formalization of computer architecture block diagrams. In this paper, we demonstrate how to use the ACADL to model AI hardware accelerators, use their ACADL description to map DNNs onto them, and explain the timing simulation semantics to gather performance results.
arXiv Detail & Related papers (2024-01-30T19:27:16Z)
Input Convex Lipschitz RNN: A Fast and Robust Approach for Engineering Tasks [14.835081385422653]
We develop a novel network architecture, termed Input Convex Lipschitz Recurrent Neural Networks. This model is explicitly designed for fast and robust optimization-based tasks. We have successfully implemented this model in various practical engineering applications.
arXiv Detail & Related papers (2024-01-15T06:26:53Z)
Distributed On-Sensor Compute System for AR/VR Devices: A Semi-Analytical Simulation Framework for Power Estimation [2.5696683295721883]
We show that a novel distributed on-sensor compute architecture can reduce the system power consumption compared to a centralized system. We show that, in the case of the compute-intensive machine learning based Hand Tracking algorithm, the distributed on-sensor compute architecture can reduce the system power consumption.
arXiv Detail & Related papers (2022-03-14T20:18:24Z)
FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task. The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources. It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z)
Reconfigurable co-processor architecture with limited numerical precision to accelerate deep convolutional neural networks [0.38848561367220275]
Convolutional Neural Networks (CNNs) are widely used in deep learning applications, e.g. visual systems, robotics etc. Here, we present a model-independent reconfigurable co-processing architecture to accelerate CNNs. In contrast to existing solutions, we introduce limited precision 32 bit Q-format fixed point quantization for arithmetic representations and operations.
arXiv Detail & Related papers (2021-08-21T09:50:54Z)
Efficient Micro-Structured Weight Unification and Pruning for Neural Network Compression [56.83861738731913]
Deep Neural Network (DNN) models are essential for practical applications, especially for resource limited devices. Previous unstructured or structured weight pruning methods can hardly truly accelerate inference. We propose a generalized weight unification framework at a hardware compatible micro-structured level to achieve high amount of compression and acceleration.
arXiv Detail & Related papers (2021-06-15T17:22:59Z)
One-step regression and classification with crosspoint resistive memory arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge. One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition. Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z)
HCM: Hardware-Aware Complexity Metric for Neural Network Architectures [6.556553154231475]
This paper introduces a hardware-aware complexity metric that aims to assist the system designer of the neural network architectures. We demonstrate how the proposed metric can help evaluate different design alternatives of neural network models on resource-restricted devices.
arXiv Detail & Related papers (2020-04-19T16:42:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.