TNN7: A Custom Macro Suite for Implementing Highly Optimized Designs of
Neuromorphic TNNs
- URL: http://arxiv.org/abs/2205.07410v1
- Date: Mon, 16 May 2022 01:03:41 GMT
- Title: TNN7: A Custom Macro Suite for Implementing Highly Optimized Designs of
Neuromorphic TNNs
- Authors: Harideep Nair, Prabhu Vellaisamy, Santha Bhasuthkar, and John Paul
Shen
- Abstract summary: Temporal Neural Networks (TNNs) exhibit energy-efficient online sensory processing capabilities.
This work proposes TNN7, a suite of nine highly optimized custom macros developed using a predictive 7nm Process Design Kit (PDK)
An unsupervised time-series clustering TNN delivering competitive performance can be implemented within 40 uW power and 0.05 mm2 area.
A 4-layer TNN that achieves an MNIST error rate of 1% consumes only 18 mW and 24.63 mm2.
- Score: 2.9068923524970227
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporal Neural Networks (TNNs), inspired from the mammalian neocortex,
exhibit energy-efficient online sensory processing capabilities. Recent works
have proposed a microarchitecture design framework for implementing TNNs and
demonstrated competitive performance on vision and time-series applications.
Building on them, this work proposes TNN7, a suite of nine highly optimized
custom macros developed using a predictive 7nm Process Design Kit (PDK), to
enhance the efficiency, modularity and flexibility of the TNN design framework.
TNN prototypes for two applications are used for evaluation of TNN7. An
unsupervised time-series clustering TNN delivering competitive performance can
be implemented within 40 uW power and 0.05 mm^2 area, while a 4-layer TNN that
achieves an MNIST error rate of 1% consumes only 18 mW and 24.63 mm^2. On
average, the proposed macros reduce power, delay, area, and energy-delay
product by 14%, 16%, 28%, and 45%, respectively. Furthermore, employing TNN7
significantly reduces the synthesis runtime of TNN designs (by more than 3x),
allowing for highly-scaled TNN implementations to be realized.
Related papers
- Scalable Mechanistic Neural Networks [52.28945097811129]
We propose an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences.
By reformulating the original Mechanistic Neural Network (MNN) we reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear.
Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources.
arXiv Detail & Related papers (2024-10-08T14:27:28Z) - MT-SNN: Enhance Spiking Neural Network with Multiple Thresholds [2.3531574267580035]
Spiking neural networks (SNNs) present a promising energy efficient alternative to traditional Artificial Neural Networks (ANNs)
We introduce Multiple Threshold (MT) approaches to significantly enhance SNN accuracy by mitigating precision loss.
Our experiments on CIFAR10, CIFAR100, ImageNet, and DVS-CIFAR10 datasets demonstrate that both MT modes substantially improve the performance of single-threshold SNNs.
arXiv Detail & Related papers (2023-03-20T14:04:50Z) - SNN2ANN: A Fast and Memory-Efficient Training Framework for Spiking
Neural Networks [117.56823277328803]
Spiking neural networks are efficient computation models for low-power environments.
We propose a SNN-to-ANN (SNN2ANN) framework to train the SNN in a fast and memory-efficient way.
Experiment results show that our SNN2ANN-based models perform well on the benchmark datasets.
arXiv Detail & Related papers (2022-06-19T16:52:56Z) - Exploiting Low-Rank Tensor-Train Deep Neural Networks Based on
Riemannian Gradient Descent With Illustrations of Speech Processing [74.31472195046099]
We exploit a low-rank tensor-train deep neural network (TT-DNN) to build an end-to-end deep learning pipeline, namely LR-TT-DNN.
A hybrid model combining LR-TT-DNN with a convolutional neural network (CNN) is set up to boost the performance.
Our empirical evidence demonstrates that the LR-TT-DNN and CNN+(LR-TT-DNN) models with fewer model parameters can outperform the TT-DNN and CNN+(LR-TT-DNN) counterparts.
arXiv Detail & Related papers (2022-03-11T15:55:34Z) - Weightless Neural Networks for Efficient Edge Inference [1.7882696915798877]
Weightless Neural Networks (WNNs) are a class of machine learning model which use table lookups to perform inference.
We propose a novel WNN architecture, BTHOWeN, with key algorithmic and architectural improvements over prior work.
BTHOWeN targets the large and growing edge computing sector by providing superior latency and energy efficiency.
arXiv Detail & Related papers (2022-03-03T01:46:05Z) - Can Deep Neural Networks be Converted to Ultra Low-Latency Spiking
Neural Networks? [3.2108350580418166]
Spiking neural networks (SNNs) operate via binary spikes distributed over time.
SOTA training strategies for SNNs involve conversion from a non-spiking deep neural network (DNN)
We propose a new training algorithm that accurately captures these distributions, minimizing the error between the DNN and converted SNN.
arXiv Detail & Related papers (2021-12-22T18:47:45Z) - Dynamically Throttleable Neural Networks (TNN) [24.052859278938858]
Conditional computation for Deep Neural Networks (DNNs) reduce overall computational load and improve model accuracy by running a subset of the network.
We present a runtime throttleable neural network (TNN) that can adaptively self-regulate its own performance target and computing resources.
arXiv Detail & Related papers (2020-11-01T20:17:42Z) - Direct CMOS Implementation of Neuromorphic Temporal Neural Networks for
Sensory Processing [4.084672048082021]
Temporal Neural Networks (TNNs) use time as a resource to represent and process information, mimicking the behavior of the mammalian neocortex.
This work focuses on implementing TNNs using off-the-shelf digital CMOS technology.
arXiv Detail & Related papers (2020-08-27T20:36:34Z) - FATNN: Fast and Accurate Ternary Neural Networks [89.07796377047619]
Ternary Neural Networks (TNNs) have received much attention due to being potentially orders of magnitude faster in inference, as well as more power efficient, than full-precision counterparts.
In this work, we show that, under some mild constraints, computational complexity of the ternary inner product can be reduced by a factor of 2.
We elaborately design an implementation-dependent ternary quantization algorithm to mitigate the performance gap.
arXiv Detail & Related papers (2020-08-12T04:26:18Z) - SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost
Computation [97.78417228445883]
We present SmartExchange, an algorithm- hardware co-design framework for energy-efficient inference of deep neural networks (DNNs)
We develop a novel algorithm to enforce a specially favorable DNN weight structure, where each layerwise weight matrix can be stored as the product of a small basis matrix and a large sparse coefficient matrix whose non-zero elements are all power-of-2.
We further design a dedicated accelerator to fully utilize the SmartExchange-enforced weights to improve both energy efficiency and latency performance.
arXiv Detail & Related papers (2020-05-07T12:12:49Z) - Tensor-to-Vector Regression for Multi-channel Speech Enhancement based
on Tensor-Train Network [53.47564132861866]
We propose a tensor-to-vector regression approach to multi-channel speech enhancement.
The key idea is to cast the conventional deep neural network (DNN) based vector-to-vector regression formulation under a tensor-train network (TTN) framework.
In 8-channel conditions, a PESQ of 3.12 is achieved using 20 million parameters for TTN, whereas a DNN with 68 million parameters can only attain a PESQ of 3.06.
arXiv Detail & Related papers (2020-02-03T02:58:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.