Layer-specific Optimization for Mixed Data Flow with Mixed Precision in
FPGA Design for CNN-based Object Detectors
- URL: http://arxiv.org/abs/2009.01588v1
- Date: Thu, 3 Sep 2020 11:27:40 GMT
- Title: Layer-specific Optimization for Mixed Data Flow with Mixed Precision in
FPGA Design for CNN-based Object Detectors
- Authors: Duy Thanh Nguyen, Hyun Kim, and Hyuk-Jae Lee
- Abstract summary: Convolutional neural networks (CNNs) require both intensive computation and frequent memory access.
This paper proposes a layer-specific design that employs different organizations that are optimized for the different layers.
The proposed design employs two layer-specific optimizations: layer-specific mixed data flow and layer-specific mixed precision.
- Score: 16.56630393243829
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional neural networks (CNNs) require both intensive computation and
frequent memory access, which lead to a low processing speed and large power
dissipation. Although the characteristics of the different layers in a CNN are
frequently quite different, previous hardware designs have employed common
optimization schemes for them. This paper proposes a layer-specific design that
employs different organizations that are optimized for the different layers.
The proposed design employs two layer-specific optimizations: layer-specific
mixed data flow and layer-specific mixed precision. The mixed data flow aims to
minimize the off-chip access while demanding a minimal on-chip memory (BRAM)
resource of an FPGA device. The mixed precision quantization is to achieve both
a lossless accuracy and an aggressive model compression, thereby further
reducing the off-chip access. A Bayesian optimization approach is used to
select the best sparsity for each layer, achieving the best trade-off between
the accuracy and compression. This mixing scheme allows the entire network
model to be stored in BRAMs of the FPGA to aggressively reduce the off-chip
access, and thereby achieves a significant performance enhancement. The model
size is reduced by 22.66-28.93 times compared to that in a full-precision
network with a negligible degradation of accuracy on VOC, COCO, and ImageNet
datasets. Furthermore, the combination of mixed dataflow and mixed precision
significantly outperforms the previous works in terms of both throughput,
off-chip access, and on-chip memory requirement.
Related papers
- MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design [1.3589914205911104]
We make a comprehensive analysis of the general quantization principles on their effect to the triangle of accuracy, memory consumption and system efficiency.
We propose MixLLM that explores the new optimization space of mixed-precision quantization between output features.
We present the sweet spot of quantization configuration of algorithm-system co-design that leads to high accuracy and system efficiency.
arXiv Detail & Related papers (2024-12-19T07:15:15Z) - Progressive Mixed-Precision Decoding for Efficient LLM Inference [49.05448842542558]
We introduce Progressive Mixed-Precision Decoding (PMPD) to address the memory-boundedness of decoding.
PMPD achieves 1.4$-$12.2$times$ speedup in matrix-vector multiplications over fp16 models.
Our approach delivers a throughput gain of 3.8$-$8.0$times$ over fp16 models and up to 1.54$times$ over uniform quantization approaches.
arXiv Detail & Related papers (2024-10-17T11:46:33Z) - SySMOL: Co-designing Algorithms and Hardware for Neural Networks with Heterogeneous Precisions [20.241671088121144]
Recent quantization techniques have enabled heterogeneous precisions at very fine granularity.
These networks require additional hardware to decode the precision settings for individual variables, align the variables, and provide fine-grained mixed-precision compute capabilities.
We present an end-to-end co-design approach to efficiently execute networks with fine-grained heterogeneous precisions.
arXiv Detail & Related papers (2023-11-23T17:20:09Z) - Efficient and Effective Methods for Mixed Precision Neural Network
Quantization for Faster, Energy-efficient Inference [3.3213055774512648]
Quantizing networks to lower precision is a powerful technique for simplifying networks.
Mixed precision quantization methods selectively tune the precision of individual layers to achieve a minimum drop in task performance.
To estimate the impact of layer precision choice on task performance, two methods are introduced.
Using EAGL and ALPS for layer precision selection, full-precision accuracy is recovered with a mix of 4-bit and 2-bit layers.
arXiv Detail & Related papers (2023-01-30T23:26:33Z) - AMED: Automatic Mixed-Precision Quantization for Edge Devices [3.5223695602582614]
Quantized neural networks are well known for reducing the latency, power consumption, and model size without significant harm to the performance.
Mixed-precision quantization offers better utilization of customized hardware that supports arithmetic operations at different bitwidths.
arXiv Detail & Related papers (2022-05-30T21:23:22Z) - Collaborative Intelligent Reflecting Surface Networks with Multi-Agent
Reinforcement Learning [63.83425382922157]
Intelligent reflecting surface (IRS) is envisioned to be widely applied in future wireless networks.
In this paper, we investigate a multi-user communication system assisted by cooperative IRS devices with the capability of energy harvesting.
arXiv Detail & Related papers (2022-03-26T20:37:14Z) - Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time
Mobile Acceleration [71.80326738527734]
We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations.
We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
arXiv Detail & Related papers (2021-11-22T23:53:14Z) - RMSMP: A Novel Deep Neural Network Quantization Framework with Row-wise
Mixed Schemes and Multiple Precisions [43.27226390407956]
This work proposes a novel Deep Neural Network (DNN) quantization framework, namely RMSMP, with a Row-wise Mixed-Scheme and Multi-Precision approach.
The proposed RMSMP is tested for the image classification and natural language processing (BERT) applications.
It achieves the best accuracy performance among state-of-the-arts under the same equivalent precisions.
arXiv Detail & Related papers (2021-10-30T02:53:35Z) - Fully Quantized Image Super-Resolution Networks [81.75002888152159]
We propose a Fully Quantized image Super-Resolution framework (FQSR) to jointly optimize efficiency and accuracy.
We apply our quantization scheme on multiple mainstream super-resolution architectures, including SRResNet, SRGAN and EDSR.
Our FQSR using low bits quantization can achieve on par performance compared with the full-precision counterparts on five benchmark datasets.
arXiv Detail & Related papers (2020-11-29T03:53:49Z) - AQD: Towards Accurate Fully-Quantized Object Detection [94.06347866374927]
We propose an Accurate Quantized object Detection solution, termed AQD, to get rid of floating-point computation.
Our AQD achieves comparable or even better performance compared with the full-precision counterpart under extremely low-bit schemes.
arXiv Detail & Related papers (2020-07-14T09:07:29Z) - Rethinking Differentiable Search for Mixed-Precision Neural Networks [83.55785779504868]
Low-precision networks with weights and activations quantized to low bit-width are widely used to accelerate inference on edge devices.
Current solutions are uniform, using identical bit-width for all filters.
This fails to account for the different sensitivities of different filters and is suboptimal.
Mixed-precision networks address this problem, by tuning the bit-width to individual filter requirements.
arXiv Detail & Related papers (2020-04-13T07:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.