FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for
Mixed-signal DNN Accelerator
- URL: http://arxiv.org/abs/2106.09144v1
- Date: Wed, 16 Jun 2021 21:42:08 GMT
- Title: FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for
Mixed-signal DNN Accelerator
- Authors: Geng Yuan, Payman Behnam, Zhengang Li, Ali Shafiee, Sheng Lin,
Xiaolong Ma, Hang Liu, Xuehai Qian, Mahdi Nazm Bojnordi, Yanzhi Wang, Caiwen
Ding
- Abstract summary: FORMS is a fine-grained ReRAM-based DNN accelerator with polarized weights.
It achieves significant throughput improvement and speed up in frame per second over ISAAC with similar area cost.
- Score: 33.19099033687952
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent works demonstrated the promise of using resistive random access memory
(ReRAM) as an emerging technology to perform inherently parallel analog domain
in-situ matrix-vector multiplication -- the intensive and key computation in
DNNs. With weights stored in the ReRAM crossbar cells as conductance, when the
input vector is applied to word lines, the matrix-vector multiplication results
can be generated as the current in bit lines. A key problem is that the weight
can be either positive or negative, but the in-situ computation assumes all
cells on each crossbar column with the same sign. The current architectures
either use two ReRAM crossbars for positive and negative weights, or add an
offset to weights so that all values become positive. Neither solution is
ideal: they either double the cost of crossbars, or incur extra offset
circuity. To better solve this problem, this paper proposes FORMS, a
fine-grained ReRAM-based DNN accelerator with polarized weights. Instead of
trying to represent the positive/negative weights, our key design principle is
to enforce exactly what is assumed in the in-situ computation -- ensuring that
all weights in the same column of a crossbar have the same sign. It naturally
avoids the cost of an additional crossbar. Such weights can be nicely generated
using alternating direction method of multipliers (ADMM) regularized
optimization, which can exactly enforce certain patterns in DNN weights. To
achieve high accuracy, we propose to use fine-grained sub-array columns, which
provide a unique opportunity for input zero-skipping, significantly avoiding
unnecessary computations. It also makes the hardware much easier to implement.
Putting all together, with the same optimized models, FORMS achieves
significant throughput improvement and speed up in frame per second over ISAAC
with similar area cost.
Related papers
- BasisN: Reprogramming-Free RRAM-Based In-Memory-Computing by Basis Combination for Deep Neural Networks [9.170451418330696]
We propose BasisN framework to accelerate deep neural networks (DNNs) on any number of crossbars without reprogramming.
We show that cycles per inference and energy-delay product were reduced to below 1% compared with applying reprogramming on crossbars.
arXiv Detail & Related papers (2024-07-04T08:47:05Z) - Quantum encoder for fixed Hamming-weight subspaces [0.0]
We present an exact $n$-qubit computational-basis amplitude encoder of real- or complex-principle data vectors of $d=binomnk$ provided in analytical form.
We also perform an experimental proof-of-principle demonstration of our scheme on a commercial trapped-ion quantum computer.
arXiv Detail & Related papers (2024-05-30T18:26:41Z) - Optimal Input Gain: All You Need to Supercharge a Feed-Forward Neural
Network [0.6562256987706128]
It is shown that pre-processing inputs using linear transformation are equivalent to multiplying the negative gradient matrix with an autocorrelation matrix per training iteration.
It is shown that OIG improved HWO could be a significant building block to more complex deep learning architectures.
arXiv Detail & Related papers (2023-03-30T22:20:16Z) - Bounding the Width of Neural Networks via Coupled Initialization -- A
Worst Case Analysis [121.9821494461427]
We show how to significantly reduce the number of neurons required for two-layer ReLU networks.
We also prove new lower bounds that improve upon prior work, and that under certain assumptions, are best possible.
arXiv Detail & Related papers (2022-06-26T06:51:31Z) - Fast Differentiable Matrix Square Root and Inverse Square Root [65.67315418971688]
We propose two more efficient variants to compute the differentiable matrix square root and the inverse square root.
For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP), and the other method is to use Matrix Pad'e Approximants (MPA)
A series of numerical tests show that both methods yield considerable speed-up compared with the SVD or the NS iteration.
arXiv Detail & Related papers (2022-01-29T10:00:35Z) - Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via
GDPA Linearization [59.87663954467815]
Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer.
In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph.
Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.
arXiv Detail & Related papers (2021-09-10T07:01:15Z) - Why Approximate Matrix Square Root Outperforms Accurate SVD in Global
Covariance Pooling? [59.820507600960745]
We propose a new GCP meta-layer that uses SVD in the forward pass, and Pad'e Approximants in the backward propagation to compute the gradients.
The proposed meta-layer has been integrated into different CNN models and achieves state-of-the-art performances on both large-scale and fine-grained datasets.
arXiv Detail & Related papers (2021-05-06T08:03:45Z) - SiMaN: Sign-to-Magnitude Network Binarization [165.5630656849309]
We show that our weight binarization provides an analytical solution by encoding high-magnitude weights into +1s, and 0s otherwise.
We prove that the learned weights of binarized networks roughly follow a Laplacian distribution that does not allow entropy.
Our method, dubbed sign-to- neural network binarization (SiMaN), is evaluated on CIFAR-10 and ImageNet.
arXiv Detail & Related papers (2021-02-16T07:03:51Z) - Accelerating Feedforward Computation via Parallel Nonlinear Equation
Solving [106.63673243937492]
Feedforward computation, such as evaluating a neural network or sampling from an autoregressive model, is ubiquitous in machine learning.
We frame the task of feedforward computation as solving a system of nonlinear equations. We then propose to find the solution using a Jacobi or Gauss-Seidel fixed-point method, as well as hybrid methods of both.
Our method is guaranteed to give exactly the same values as the original feedforward computation with a reduced (or equal) number of parallelizable iterations, and hence reduced time given sufficient parallel computing power.
arXiv Detail & Related papers (2020-02-10T10:11:31Z) - A Regression Tsetlin Machine with Integer Weighted Clauses for Compact
Pattern Representation [9.432068833600884]
The Regression Tsetlin Machine (RTM) addresses the lack of interpretability impeding state-of-the-art nonlinear regression models.
We introduce integer weighted clauses to reduce computation cost N times and increase interpretability.
We evaluate the potential of the integer weighted RTM using six artificial datasets.
arXiv Detail & Related papers (2020-02-04T12:06:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.