Exposing Hardware Building Blocks to Machine Learning Frameworks
- URL: http://arxiv.org/abs/2004.05898v1
- Date: Fri, 10 Apr 2020 14:26:00 GMT
- Title: Exposing Hardware Building Blocks to Machine Learning Frameworks
- Authors: Yash Akhauri
- Abstract summary: We focus on how to design topologies that complement such a view of neurons as unique functions.
We develop a library that supports training a neural network with custom sparsity and quantization.
- Score: 4.56877715768796
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There are a plethora of applications that demand high throughput and low
latency algorithms leveraging machine learning methods. This need for real time
processing can be seen in industries ranging from developing neural network
based pre-distortors for enhanced mobile broadband to designing FPGA-based
triggers in major scientific efforts by CERN for particle physics. In this
thesis, we explore how niche domains can benefit vastly if we look at neurons
as a unique boolean function of the form $f:B^{I} \rightarrow B^{O}$, where $B
= \{0,1\}$. We focus on how to design topologies that complement such a view of
neurons, how to automate such a strategy of neural network design, and
inference of such networks on Xilinx FPGAs. Major hardware borne constraints
arise when designing topologies that view neurons as unique boolean functions.
Fundamentally, realizing such topologies on hardware asserts a strict limit on
the 'fan-in' bits of a neuron due to the doubling of permutations possible with
every increment in input bit-length. We address this limit by exploring
different methods of implementing sparsity and explore activation quantization.
Further, we develop a library that supports training a neural network with
custom sparsity and quantization. This library also supports conversion of
trained Sparse Quantized networks from PyTorch to VERILOG code which is then
synthesized using Vivado, all of which is part of the LogicNet tool-flow. To
aid faster prototyping, we also support calculation of the worst-case hardware
cost of any given topology. We hope that our insights into the behavior of
extremely sparse quantized neural networks are of use to the research community
and by extension allow people to use the LogicNet design flow to deploy highly
efficient neural networks.
Related papers
- Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Variable Bitrate Neural Fields [75.24672452527795]
We present a dictionary method for compressing feature grids, reducing their memory consumption by up to 100x.
We formulate the dictionary optimization as a vector-quantized auto-decoder problem which lets us learn end-to-end discrete neural representations in a space where no direct supervision is available.
arXiv Detail & Related papers (2022-06-15T17:58:34Z) - Predictive Coding: Towards a Future of Deep Learning beyond
Backpropagation? [41.58529335439799]
The backpropagation of error algorithm used to train deep neural networks has been fundamental to the successes of deep learning.
Recent work has developed the idea into a general-purpose algorithm able to train neural networks using only local computations.
We show the substantially greater flexibility of predictive coding networks against equivalent deep neural networks.
arXiv Detail & Related papers (2022-02-18T22:57:03Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - E3NE: An End-to-End Framework for Accelerating Spiking Neural Networks
with Emerging Neural Encoding on FPGAs [6.047137174639418]
End-to-end framework E3NE automates the generation of efficient SNN inference logic for FPGAs.
E3NE uses less than 50% of hardware resources and 20% less power, while reducing the latency by an order of magnitude.
arXiv Detail & Related papers (2021-11-19T04:01:19Z) - A quantum algorithm for training wide and deep classical neural networks [72.2614468437919]
We show that conditions amenable to classical trainability via gradient descent coincide with those necessary for efficiently solving quantum linear systems.
We numerically demonstrate that the MNIST image dataset satisfies such conditions.
We provide empirical evidence for $O(log n)$ training of a convolutional neural network with pooling.
arXiv Detail & Related papers (2021-07-19T23:41:03Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - Binary Graph Neural Networks [69.51765073772226]
Graph Neural Networks (GNNs) have emerged as a powerful and flexible framework for representation learning on irregular data.
In this paper, we present and evaluate different strategies for the binarization of graph neural networks.
We show that through careful design of the models, and control of the training process, binary graph neural networks can be trained at only a moderate cost in accuracy on challenging benchmarks.
arXiv Detail & Related papers (2020-12-31T18:48:58Z) - When Machine Learning Meets Quantum Computers: A Case Study [29.551615987978046]
This paper carries out a case study to demonstrate an end-to-end implementation of neural network acceleration on quantum processors.
We employ the multilayer perceptron to complete image classification tasks using the standard and widely used MNIST dataset.
This work targets the acceleration of the inference phase of a trained neural network on the quantum processor.
arXiv Detail & Related papers (2020-12-18T17:06:11Z) - LogicNets: Co-Designed Neural Networks and Circuits for
Extreme-Throughput Applications [6.9276012494882835]
We present a novel method for designing neural network topologies that directly map to a highly efficient FPGA implementation.
We show that the combination of sparsity and low-bit activation quantization results in high-speed circuits with small logic depth and low LUT cost.
arXiv Detail & Related papers (2020-04-06T22:15:41Z) - Lossless Compression of Deep Neural Networks [17.753357839478575]
Deep neural networks have been successful in many predictive modeling tasks, such as image and language recognition.
It is challenging to deploy these networks under limited computational resources, such as in mobile devices.
We introduce an algorithm that removes units and layers of a neural network while not changing the output that is produced.
arXiv Detail & Related papers (2020-01-01T15:04:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.