Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing
GPUs
- URL: http://arxiv.org/abs/2006.16578v2
- Date: Tue, 15 Dec 2020 00:13:59 GMT
- Title: Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing
GPUs
- Authors: Ang Li and Simon Su
- Abstract summary: Binarized neural networks (BNNs) have tremendous speedups over conventional deep neural networks.
We show that the latest tensorcores in NVIDIA Turing GPUs start to experimentally support bit computation.
Our BTC-BNN design can process ImageNet at a rate of 5.6K images per second, 77% faster than state-of-the-art.
- Score: 15.02711144514149
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite foreseeing tremendous speedups over conventional deep neural
networks, the performance advantage of binarized neural networks (BNNs) has
merely been showcased on general-purpose processors such as CPUs and GPUs. In
fact, due to being unable to leverage bit-level-parallelism with a word-based
architecture, GPUs have been criticized for extremely low utilization (1%) when
executing BNNs. Consequently, the latest tensorcores in NVIDIA Turing GPUs
start to experimentally support bit computation. In this work, we look into
this brand new bit computation capability and characterize its unique features.
We show that the stride of memory access can significantly affect performance
delivery and a data-format co-design is highly desired to support the
tensorcores for achieving superior performance than existing software solutions
without tensorcores. We realize the tensorcore-accelerated BNN design,
particularly the major functions for fully-connect and convolution layers --
bit matrix multiplication and bit convolution. Evaluations on two NVIDIA Turing
GPUs show that, with ResNet-18, our BTC-BNN design can process ImageNet at a
rate of 5.6K images per second, 77% faster than state-of-the-art. Our BNN
approach is released on https://github.com/pnnl/TCBNN.
Related papers
- BitGNN: Unleashing the Performance Potential of Binary Graph Neural
Networks on GPUs [19.254040098787893]
Recent studies have shown that Binary Graph Neural Networks (GNNs) are promising for saving computations of GNNs through binarized tensors.
This work redesigns the binary GNN inference from the efficiency perspective.
Results on real-world graphs with GCNs, GraphSAGE, and GraphSAINT show that the proposed techniques outperform state-of-the-art binary GNN implementations by 8-22X with the same accuracy maintained.
arXiv Detail & Related papers (2023-05-04T03:20:25Z) - Exploiting Kernel Compression on BNNs [0.0]
In this work, we observe that the number of unique sequences representing a set of weights is typically low.
We propose a clustering scheme to identify the most common sequences of bits and replace the less common ones with some similar common sequences.
Our experimental results show that our technique can reduce memory requirement by 1.32x and improve performance by 1.35x.
arXiv Detail & Related papers (2022-12-01T16:05:10Z) - TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs [21.63854538768414]
We propose TC-GNN, the first GNN framework based on GPU Core Units (TCUs)
The core idea is to reconcile the "Sparse" GNN with the high-performance "Dense" TCUs.
Rigorous experiments show an average of 1.70 speedup over the state-of-the-art DGL framework.
arXiv Detail & Related papers (2021-12-03T18:06:23Z) - Sub-bit Neural Networks: Learning to Compress and Accelerate Binary
Neural Networks [72.81092567651395]
Sub-bit Neural Networks (SNNs) are a new type of binary quantization design tailored to compress and accelerate BNNs.
SNNs are trained with a kernel-aware optimization framework, which exploits binary quantization in the fine-grained convolutional kernel space.
Experiments on visual recognition benchmarks and the hardware deployment on FPGA validate the great potentials of SNNs.
arXiv Detail & Related papers (2021-10-18T11:30:29Z) - Content-Aware Convolutional Neural Networks [98.97634685964819]
Convolutional Neural Networks (CNNs) have achieved great success due to the powerful feature learning ability of convolution layers.
We propose a Content-aware Convolution (CAC) that automatically detects the smooth windows and applies a 1x1 convolutional kernel to replace the original large kernel.
arXiv Detail & Related papers (2021-06-30T03:54:35Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - DistGNN: Scalable Distributed Training for Large-Scale Graph Neural
Networks [58.48833325238537]
Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible.
In this paper, we presentGNN that optimize the well-known Deep Graph Library (DGL) for full-batch training on CPU clusters.
Our results on four common GNN benchmark datasets show up to 3.7x speed-up using a single CPU socket and up to 97x speed-up using 128 CPU sockets.
arXiv Detail & Related papers (2021-04-14T08:46:35Z) - FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with
Fractional Activations [20.218382369944152]
Binary neural networks (BNNs) have 1-bit weights and activations.
BNNs tend to produce a much lower accuracy on realistic datasets such as ImageNet.
This work proposes FracBNN, which exploits fractional activations to substantially improve the accuracy of BNNs.
arXiv Detail & Related papers (2020-12-22T17:49:30Z) - At-Scale Sparse Deep Neural Network Inference with Efficient GPU
Implementation [24.824295164938604]
This paper presents GPU performance optimization and scaling results for inference models of the Sparse Deep Neural Network Challenge 2020.
Sparse deep neural networks (SpDNN) have shown promise for reining in the memory footprint of large neural networks.
This work presents optimized sparse matrix multiplication kernels fused with the ReLU function.
arXiv Detail & Related papers (2020-07-28T12:09:43Z) - Distillation Guided Residual Learning for Binary Convolutional Neural
Networks [83.6169936912264]
It is challenging to bridge the performance gap between Binary CNN (BCNN) and Floating point CNN (FCNN)
We observe that, this performance gap leads to substantial residuals between intermediate feature maps of BCNN and FCNN.
To minimize the performance gap, we enforce BCNN to produce similar intermediate feature maps with the ones of FCNN.
This training strategy, i.e., optimizing each binary convolutional block with block-wise distillation loss derived from FCNN, leads to a more effective optimization to BCNN.
arXiv Detail & Related papers (2020-07-10T07:55:39Z) - Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization.
Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.