Reconfigurable co-processor architecture with limited numerical
precision to accelerate deep convolutional neural networks
- URL: http://arxiv.org/abs/2109.03040v1
- Date: Sat, 21 Aug 2021 09:50:54 GMT
- Title: Reconfigurable co-processor architecture with limited numerical
precision to accelerate deep convolutional neural networks
- Authors: Sasindu Wijeratne, Sandaruwan Jayaweera, Mahesh Dananjaya, Ajith
Pasqual
- Abstract summary: Convolutional Neural Networks (CNNs) are widely used in deep learning applications, e.g. visual systems, robotics etc.
Here, we present a model-independent reconfigurable co-processing architecture to accelerate CNNs.
In contrast to existing solutions, we introduce limited precision 32 bit Q-format fixed point quantization for arithmetic representations and operations.
- Score: 0.38848561367220275
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Convolutional Neural Networks (CNNs) are widely used in deep learning
applications, e.g. visual systems, robotics etc. However, existing software
solutions are not efficient. Therefore, many hardware accelerators have been
proposed optimizing performance, power and resource utilization of the
implementation. Amongst existing solutions, Field Programmable Gate Array
(FPGA) based architecture provides better cost-energy-performance trade-offs as
well as scalability and minimizing development time. In this paper, we present
a model-independent reconfigurable co-processing architecture to accelerate
CNNs. Our architecture consists of parallel Multiply and Accumulate (MAC) units
with caching techniques and interconnection networks to exploit maximum data
parallelism. In contrast to existing solutions, we introduce limited precision
32 bit Q-format fixed point quantization for arithmetic representations and
operations. As a result, our architecture achieved significant reduction in
resource utilization with competitive accuracy. Furthermore, we developed an
assembly-type microinstructions to access the co-processing fabric to manage
layer-wise parallelism, thereby making re-use of limited resources. Finally, we
have tested our architecture up to 9x9 kernel size on Xilinx Virtex 7 FPGA,
achieving a throughput of up to 226.2 GOp/S for 3x3 kernel size.
Related papers
- Weight Block Sparsity: Training, Compilation, and AI Engine Accelerators [0.0]
Deep Neural Networks (DNNs) are being developed, trained, and utilized, putting a strain on both advanced and limited devices.
Our solution is to implement em weight block sparsity, which is a structured sparsity that is friendly to hardware.
We will present performance estimates using accurate and complete code generation for AIE2 configuration sets (AMD Versal FPGAs) with Resnet50, Inception V3, and VGG16.
arXiv Detail & Related papers (2024-07-12T17:37:49Z) - CiMNet: Towards Joint Optimization for DNN Architecture and Configuration for Compute-In-Memory Hardware [6.308771129448823]
We present CiMNet, a framework that jointly searches for optimal sub-networks and hardware configurations for compute in-memory (CiM)
The proposed framework can comprehend the complex interplay between a sub-network's performance and the CiM hardware configuration choices.
arXiv Detail & Related papers (2024-02-19T02:12:07Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - Multi-Exit Semantic Segmentation Networks [78.44441236864057]
We propose a framework for converting state-of-the-art segmentation models to MESS networks.
specially trained CNNs that employ parametrised early exits along their depth to save during inference on easier samples.
We co-optimise the number, placement and architecture of the attached segmentation heads, along with the exit policy, to adapt to the device capabilities and application-specific requirements.
arXiv Detail & Related papers (2021-06-07T11:37:03Z) - HAO: Hardware-aware neural Architecture Optimization for Efficient
Inference [25.265181492143107]
We develop an integer programming algorithm to prune the design space of a neural network search algorithm.
Our algorithm achieves 72.5% top-1 accuracy on ImageNet at framerate 50, which is 60% faster than MnasNet and 135% faster than FBNet with comparable accuracy.
arXiv Detail & Related papers (2021-04-26T17:59:29Z) - Towards Accurate and Compact Architectures via Neural Architecture
Transformer [95.4514639013144]
It is necessary to optimize the operations inside an architecture to improve the performance without introducing extra computational cost.
We have proposed a Neural Architecture Transformer (NAT) method which casts the optimization problem into a Markov Decision Process (MDP)
We propose a Neural Architecture Transformer++ (NAT++) method which further enlarges the set of candidate transitions to improve the performance of architecture optimization.
arXiv Detail & Related papers (2021-02-20T09:38:10Z) - Hardware-Centric AutoML for Mixed-Precision Quantization [34.39845532939529]
Conventional quantization algorithm ignores the different hardware architectures and quantizes all the layers in a uniform way.
In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework which leverages the reinforcement learning to automatically determine the quantization policy.
Our framework effectively reduced the latency by 1.4-1.95x and the energy consumption by 1.9x with negligible loss of accuracy compared with the fixed bitwidth (8 bits) quantization.
arXiv Detail & Related papers (2020-08-11T17:30:22Z) - Stage-Wise Neural Architecture Search [65.03109178056937]
Modern convolutional networks such as ResNet and NASNet have achieved state-of-the-art results in many computer vision applications.
These networks consist of stages, which are sets of layers that operate on representations in the same resolution.
It has been demonstrated that increasing the number of layers in each stage improves the prediction ability of the network.
However, the resulting architecture becomes computationally expensive in terms of floating point operations, memory requirements and inference time.
arXiv Detail & Related papers (2020-04-23T14:16:39Z) - Near-Optimal Hardware Design for Convolutional Neural Networks [0.0]
This study proposes a novel, special-purpose, and high-efficiency hardware architecture for convolutional neural networks.
The proposed architecture maximizes the utilization of multipliers by designing the computational circuit with the same structure as that of the computational flow of the model.
An implementation based on the proposed hardware architecture has been applied in commercial AI products.
arXiv Detail & Related papers (2020-02-06T09:15:03Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.