Related papers: BEANNA: A Binary-Enabled Architecture for Neural Network Acceleration

BEANNA: A Binary-Enabled Architecture for Neural Network Acceleration

URL: http://arxiv.org/abs/2108.02313v1
Date: Wed, 4 Aug 2021 23:17:34 GMT
Title: BEANNA: A Binary-Enabled Architecture for Neural Network Acceleration
Authors: Caleb Terrill, Fred Chu
Abstract summary: This paper proposes and evaluates a neural network hardware accelerator capable of processing both floating point and binary network layers. Running at a clock speed of 100MHz, BEANNA achieves a peak throughput of 52.8 GigaOps/second.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modern hardware design trends have shifted towards specialized hardware acceleration for computationally intensive tasks like machine learning and computer vision. While these complex workloads can be accelerated by commercial GPUs, domain-specific hardware is far more optimal when needing to meet the stringent memory, throughput, and power constraints of mobile and embedded devices. This paper proposes and evaluates a Binary-Enabled Architecture for Neural Network Acceleration (BEANNA), a neural network hardware accelerator capable of processing both floating point and binary network layers. Through the use of a novel 16x16 systolic array based matrix multiplier with processing elements that compute both floating point and binary multiply-adds, BEANNA seamlessly switches between high precision floating point and binary neural network layers. Running at a clock speed of 100MHz, BEANNA achieves a peak throughput of 52.8 GigaOps/second when operating in high precision mode, and 820 GigaOps/second when operating in binary mode. Evaluation of BEANNA was performed by comparing a hybrid network with floating point outer layers and binary hidden layers to a network with only floating point layers. The hybrid network accelerated using BEANNA achieved a 194% throughput increase, a 68% memory usage decrease, and a 66% energy consumption decrease per inference, all this at the cost of a mere 0.23% classification accuracy decrease on the MNIST dataset.

Related papers

An Efficient General-Purpose Optical Accelerator for Neural Networks [4.236129222287313]
General-purpose optical accelerators (GOAs) have emerged as a promising platform to accelerate deep neural networks (DNNs) In this work, a hybrid GOA architecture is proposed to enhance the mapping efficiency of neural networks onto the GOA. The energy consumption and computation latency can also be reduced by over 67% and 21%, respectively.
arXiv Detail & Related papers (2024-09-02T13:04:08Z)
EncodingNet: A Novel Encoding-based MAC Design for Efficient Neural Network Acceleration [7.694043781601237]
We propose a novel digital multiply-accumulate (MAC) design based on encoding. In this new design, the multipliers are replaced by simple logic gates to represent the results with a wide bit representation. Since the multiplication function is replaced by a simple logic representation, the critical paths in the resulting circuits become much shorter.
arXiv Detail & Related papers (2024-02-25T09:35:30Z)
BiFSMNv2: Pushing Binary Neural Networks for Keyword Spotting to Real-Network Performance [54.214426436283134]
Deep neural networks, such as the Deep-FSMN, have been widely studied for keyword spotting (KWS) applications. We present a strong yet efficient binary neural network for KWS, namely BiFSMNv2, pushing it to the real-network accuracy performance. We highlight that benefiting from the compact architecture and optimized hardware kernel, BiFSMNv2 can achieve an impressive 25.1x speedup and 20.2x storage-saving on edge hardware.
arXiv Detail & Related papers (2022-11-13T18:31:45Z)
BiFSMN: Binary Neural Network for Keyword Spotting [47.46397208920726]
BiFSMN is an accurate and extreme-efficient binary neural network for KWS. We show that BiFSMN can achieve an impressive 22.3x speedup and 15.5x storage-saving on real-world edge hardware.
arXiv Detail & Related papers (2022-02-14T05:16:53Z)
Instant Neural Graphics Primitives with a Multiresolution Hash Encoding [67.33850633281803]
We present a versatile new input encoding that permits the use of a smaller network without sacrificing quality. A small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through a gradient descent. We achieve a combined speed of several orders of magnitude, enabling training of high-quality neural graphics primitives in a matter of seconds.
arXiv Detail & Related papers (2022-01-16T07:22:47Z)
Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks [72.81092567651395]
Sub-bit Neural Networks (SNNs) are a new type of binary quantization design tailored to compress and accelerate BNNs. SNNs are trained with a kernel-aware optimization framework, which exploits binary quantization in the fine-grained convolutional kernel space. Experiments on visual recognition benchmarks and the hardware deployment on FPGA validate the great potentials of SNNs.
arXiv Detail & Related papers (2021-10-18T11:30:29Z)
Quantized Neural Networks via {-1, +1} Encoding Decomposition and Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks. We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z)
DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks [58.48833325238537]
Full-batch training on Graph Neural Networks (GNN) to learn the structure of large graphs is a critical problem that needs to scale to hundreds of compute nodes to be feasible. In this paper, we presentGNN that optimize the well-known Deep Graph Library (DGL) for full-batch training on CPU clusters. Our results on four common GNN benchmark datasets show up to 3.7x speed-up using a single CPU socket and up to 97x speed-up using 128 CPU sockets.
arXiv Detail & Related papers (2021-04-14T08:46:35Z)
A complete, parallel and autonomous photonic neural network in a semiconductor multimode laser [0.0]
We show how a fully parallel and fully implemented photonic neural network can be realized using spatially distributed modes of an efficient and fast semiconductor laser. Importantly, all neural network connections are realized in hardware, and our processor produces results without pre- or post-processing. We train the readout weights to perform 2-bit header recognition, a 2-bit XOR and 2-bit digital analog conversion, and obtain 0.9 10-3 and 2.9 10-2 error rates for digit recognition and XOR, respectively.
arXiv Detail & Related papers (2020-12-21T07:03:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.