Sorted Weight Sectioning for Energy-Efficient Unstructured Sparse DNNs on Compute-in-Memory Crossbars
- URL: http://arxiv.org/abs/2410.11298v2
- Date: Tue, 29 Oct 2024 04:39:50 GMT
- Title: Sorted Weight Sectioning for Energy-Efficient Unstructured Sparse DNNs on Compute-in-Memory Crossbars
- Authors: Matheus Farias, H. T. Kung,
- Abstract summary: $textitsorted weight sectioning$ (SWS) is a weight allocation algorithm that places sorted deep neural network (DNN) weight sections on bit-sliced compute-in-memory (CIM) crossbars.
Our method reduces ADC energy use by 89.5% on unstructured sparse BERT models.
- Score: 4.089232204089156
- License:
- Abstract: We introduce $\textit{sorted weight sectioning}$ (SWS): a weight allocation algorithm that places sorted deep neural network (DNN) weight sections on bit-sliced compute-in-memory (CIM) crossbars to reduce analog-to-digital converter (ADC) energy consumption. Data conversions are the most energy-intensive process in crossbar operation. SWS effectively reduces this cost leveraging (1) small weights and (2) zero weights (weight sparsity). DNN weights follow bell-shaped distributions, with most weights near zero. Using SWS, we only need low-order crossbar columns for sections with low-magnitude weights. This reduces the quantity and resolution of ADCs used, exponentially decreasing ADC energy costs without significantly degrading DNN accuracy. Unstructured sparsification further sharpens the weight distribution with small accuracy loss. However, it presents challenges in hardware tracking of zeros: we cannot switch zero rows to other layer weights in unsorted crossbars without index matching. SWS efficiently addresses unstructured sparse models using offline remapping of zeros into earlier sections, which reveals full sparsity potential and maximizes energy efficiency. Our method reduces ADC energy use by 89.5% on unstructured sparse BERT models. Overall, this paper introduces a novel algorithm to promote energy-efficient CIM crossbars for unstructured sparse DNN workloads.
Related papers
- OvSW: Overcoming Silent Weights for Accurate Binary Neural Networks [19.41917323210239]
We investigate the efficiency of weight sign updates in Binary Neural Networks(BNNs)
For vanilla BNNs, over 50% of the weights remain their signs unchanged during training.
We propose Overcome Silent Weights(OvSW) to address the issue.
arXiv Detail & Related papers (2024-07-07T05:01:20Z) - Learning to Compose SuperWeights for Neural Parameter Allocation Search [61.078949532440724]
We show that our approach can generate parameters for many network using the same set of weights.
This enables us to support tasks like efficient ensembling and anytime prediction.
arXiv Detail & Related papers (2023-12-03T04:20:02Z) - Single-Shot Optical Neural Network [55.41644538483948]
'Weight-stationary' analog optical and electronic hardware has been proposed to reduce the compute resources required by deep neural networks.
We present a scalable, single-shot-per-layer weight-stationary optical processor.
arXiv Detail & Related papers (2022-05-18T17:49:49Z) - Examining and Mitigating the Impact of Crossbar Non-idealities for
Accurate Implementation of Sparse Deep Neural Networks [2.4283778735260686]
We show how sparse Deep Neural Networks (DNNs) can lead to severe accuracy losses compared to unpruned DNNs mapped onto non-ideal crossbars.
We propose two mitigation approaches - Crossbar column rearrangement and Weight-Constrained-Training (WCT)
These help in mitigating non-idealities by increasing the proportion of low conductance synapses on crossbars, thereby improving their computational accuracies.
arXiv Detail & Related papers (2022-01-13T21:56:48Z) - CREW: Computation Reuse and Efficient Weight Storage for
Hardware-accelerated MLPs and RNNs [1.0635248457021496]
We present CREW, a hardware accelerator that implements Reuse and an Efficient Weight Storage mechanism.
CREW greatly reduces the number of multiplications and provides significant savings in model memory footprint and memory bandwidth usage.
On average, CREW provides 2.61x speedup and 2.42x energy savings over a TPU-like accelerator.
arXiv Detail & Related papers (2021-07-20T11:10:54Z) - ReCU: Reviving the Dead Weights in Binary Neural Networks [153.6789340484509]
We explore the influence of "dead weights" which refer to a group of weights that are barely updated during the training of BNNs.
We prove that reviving the "dead weights" by ReCU can result in a smaller quantization error.
Our method offers not only faster BNN training, but also state-of-the-art performance on CIFAR-10 and ImageNet.
arXiv Detail & Related papers (2021-03-23T08:11:20Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and
Training [82.35376405568975]
Deep neural networks (DNNs) come with heavy parameterization, leading to external dynamic random-access memory (DRAM) for storage.
We present SmartDeal (SD), an algorithm framework to trade higher-cost memory storage/access for lower-cost computation.
We show that SD leads to 10.56x and 4.48x reduction in the storage and training energy, with negligible accuracy loss compared to state-of-the-art training baselines.
arXiv Detail & Related papers (2021-01-04T18:54:07Z) - Sparsity-Control Ternary Weight Networks [34.00378876525579]
We focus on training ternary weight -1, 0, +1 networks which can avoid multiplications and dramatically reduce the memory and requirements.
Existing approaches to training ternary weight networks cannot control the sparsity of the ternary weights.
We propose the first sparsity-control approach (SCA) to training ternary weight networks.
arXiv Detail & Related papers (2020-11-01T18:06:26Z) - Bit Error Robustness for Energy-Efficient DNN Accelerators [93.58572811484022]
We show that a combination of robust fixed-point quantization, weight clipping, and random bit error training (RandBET) improves robustness against random bit errors.
This leads to high energy savings from both low-voltage operation as well as low-precision quantization.
arXiv Detail & Related papers (2020-06-24T18:23:10Z) - Exploiting Weight Redundancy in CNNs: Beyond Pruning and Quantization [0.2538209532048866]
Pruning and quantization are proven methods for improving the performance and storage efficiency of convolutional neural networks (CNNs)
We identify another form of redundancy in CNN weight tensors, in the form of repeated patterns of similar values.
arXiv Detail & Related papers (2020-06-22T01:54:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.