Design of High-Throughput Mixed-Precision CNN Accelerators on FPGA
- URL: http://arxiv.org/abs/2208.04854v1
- Date: Tue, 9 Aug 2022 15:32:51 GMT
- Title: Design of High-Throughput Mixed-Precision CNN Accelerators on FPGA
- Authors: Cecilia Latotzke, Tim Ciesielski, and Tobias Gemmeke
- Abstract summary: Layer-wise mixed-precision quantization allows for more efficient results while inflating the design space.
We present an in-depth quantitative methodology to efficiently explore the design space considering the limited hardware resources of a given FPGA.
Our resulting hardware accelerators implement truly mixed-precision operations that enable efficient execution of layer-wise and channel-wise quantized CNNs.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Convolutional Neural Networks (CNNs) reach high accuracies in various
application domains, but require large amounts of computation and incur costly
data movements. One method to decrease these costs while trading accuracy is
weight and/or activation word-length reduction. Thereby, layer-wise
mixed-precision quantization allows for more efficient results while inflating
the design space. In this work, we present an in-depth quantitative methodology
to efficiently explore the design space considering the limited hardware
resources of a given FPGA. Our holistic exploration approach vertically
traverses the various design entry levels from the architectural down to the
logic level, and laterally covers optimization from processing elements to
dataflow for an efficient mixed-precision CNN accelerator. Our resulting
hardware accelerators implement truly mixed-precision operations that enable
efficient execution of layer-wise and channel-wise quantized CNNs. Mapping
feed-forward and identity-shortcut-connection mixed-precision CNNs result in
competitive accuracy-throughout trade-offs: 245 frames/s with 87.48% Top-5
accuracy for ResNet-18 and 92.9% Top-5 accuracy with 1.13 TOps/s for
ResNet-152, respectively. Thereby, the required memory footprint for parameters
is reduced by 4.9x and 9.4x compared to the respective floating-point baseline.
Related papers
- Hardware-Software Co-optimised Fast and Accurate Deep Reconfigurable Spiking Inference Accelerator Architecture Design Methodology [2.968768532937366]
Spiking Neural Networks (SNNs) have emerged as a promising approach to improve the energy efficiency of machine learning models.
We develop a hardware-software co-optimisation strategy to port software-trained deep neural networks (DNN) to reduced-precision spiking models.
arXiv Detail & Related papers (2024-10-07T05:04:13Z) - Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks [10.229120811024162]
deep neural networks (DNNs) pose significant challenges to their deployment on edge devices.
Common approaches to address this issue are pruning and mixed-precision quantization.
We propose a novel methodology to apply them jointly via a lightweight gradient-based search.
arXiv Detail & Related papers (2024-07-01T08:07:02Z) - On-Chip Hardware-Aware Quantization for Mixed Precision Neural Networks [52.97107229149988]
We propose an On-Chip Hardware-Aware Quantization framework, performing hardware-aware mixed-precision quantization on deployed edge devices.
For efficiency metrics, we built an On-Chip Quantization Aware pipeline, which allows the quantization process to perceive the actual hardware efficiency of the quantization operator.
For accuracy metrics, we propose Mask-Guided Quantization Estimation technology to effectively estimate the accuracy impact of operators in the on-chip scenario.
arXiv Detail & Related papers (2023-09-05T04:39:34Z) - SCONNA: A Stochastic Computing Based Optical Accelerator for Ultra-Fast,
Energy-Efficient Inference of Integer-Quantized CNNs [0.0]
A CNN inference task uses convolution operations that are typically transformed into vector-dot-product (VDP) operations.
Several photonic microring resonators (MRRs) based hardware architectures have been proposed to accelerate integer-quantized CNNs.
Existing photonic MRR-based analog accelerators exhibit a very strong trade-off between the achievable input/weight precision and VDP operation size.
arXiv Detail & Related papers (2023-02-14T13:35:15Z) - Quantized Neural Networks for Low-Precision Accumulation with Guaranteed
Overflow Avoidance [68.8204255655161]
We introduce a quantization-aware training algorithm that guarantees avoiding numerical overflow when reducing the precision of accumulators during inference.
We evaluate our algorithm across multiple quantized models that we train for different tasks, showing that our approach can reduce the precision of accumulators while maintaining model accuracy with respect to a floating-point baseline.
arXiv Detail & Related papers (2023-01-31T02:46:57Z) - FxP-QNet: A Post-Training Quantizer for the Design of Mixed
Low-Precision DNNs with Dynamic Fixed-Point Representation [2.4149105714758545]
We propose a novel framework referred to as the Fixed-Point Quantizer of deep neural Networks (FxP-QNet)
FxP-QNet adapts the quantization level for each data-structure of each layer based on the trade-off between the network accuracy and the low-precision requirements.
Results show that FxP-QNet-quantized AlexNet, VGG-16, and ResNet-18 reduce the overall memory requirements of their full-precision counterparts by 7.16x, 10.36x, and 6.44x with less than 0.95%, 0.95%, and 1.99%
arXiv Detail & Related papers (2022-03-22T23:01:43Z) - OMPQ: Orthogonal Mixed Precision Quantization [64.59700856607017]
Mixed precision quantization takes advantage of hardware's multiple bit-width arithmetic operations to unleash the full potential of network quantization.
We propose to optimize a proxy metric, the concept of networkity, which is highly correlated with the loss of the integer programming.
This approach reduces the search time and required data amount by orders of magnitude, with little compromise on quantization accuracy.
arXiv Detail & Related papers (2021-09-16T10:59:33Z) - Fully Quantized Image Super-Resolution Networks [81.75002888152159]
We propose a Fully Quantized image Super-Resolution framework (FQSR) to jointly optimize efficiency and accuracy.
We apply our quantization scheme on multiple mainstream super-resolution architectures, including SRResNet, SRGAN and EDSR.
Our FQSR using low bits quantization can achieve on par performance compared with the full-precision counterparts on five benchmark datasets.
arXiv Detail & Related papers (2020-11-29T03:53:49Z) - AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation.
Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.