SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and
Training
- URL: http://arxiv.org/abs/2101.01163v1
- Date: Mon, 4 Jan 2021 18:54:07 GMT
- Title: SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and
Training
- Authors: Xiaohan Chen, Yang Zhao, Yue Wang, Pengfei Xu, Haoran You, Chaojian
Li, Yonggan Fu, Yingyan Lin, Zhangyang Wang
- Abstract summary: Deep neural networks (DNNs) come with heavy parameterization, leading to external dynamic random-access memory (DRAM) for storage.
We present SmartDeal (SD), an algorithm framework to trade higher-cost memory storage/access for lower-cost computation.
We show that SD leads to 10.56x and 4.48x reduction in the storage and training energy, with negligible accuracy loss compared to state-of-the-art training baselines.
- Score: 82.35376405568975
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The record-breaking performance of deep neural networks (DNNs) comes with
heavy parameterization, leading to external dynamic random-access memory (DRAM)
for storage. The prohibitive energy of DRAM accesses makes it non-trivial to
deploy DNN on resource-constrained devices, calling for minimizing the weight
and data movements to improve the energy efficiency. We present SmartDeal (SD),
an algorithm framework to trade higher-cost memory storage/access for
lower-cost computation, in order to aggressively boost the storage and energy
efficiency, for both inference and training. The core of SD is a novel weight
decomposition with structural constraints, carefully crafted to unleash the
hardware efficiency potential. Specifically, we decompose each weight tensor as
the product of a small basis matrix and a large structurally sparse coefficient
matrix whose non-zeros are quantized to power-of-2. The resulting sparse and
quantized DNNs enjoy greatly reduced energy for data movement and weight
storage, incurring minimal overhead to recover the original weights thanks to
the sparse bit-operations and cost-favorable computations. Beyond inference, we
take another leap to embrace energy-efficient training, introducing innovative
techniques to address the unique roadblocks arising in training while
preserving the SD structures. We also design a dedicated hardware accelerator
to fully utilize the SD structure to improve the real energy efficiency and
latency. We conduct experiments on both multiple tasks, models and datasets in
different settings. Results show that: 1) applied to inference, SD achieves up
to 2.44x energy efficiency as evaluated via real hardware implementations; 2)
applied to training, SD leads to 10.56x and 4.48x reduction in the storage and
training energy, with negligible accuracy loss compared to state-of-the-art
training baselines. Our source codes are available online.
Related papers
- TSB: Tiny Shared Block for Efficient DNN Deployment on NVCIM Accelerators [11.496631244103773]
"Tiny Shared Block (TSB)" integrates a small shared 1x1 convolution block into the Deep Neural Network architecture.
TSB achieves over 20x inference accuracy gap improvement, over 5x training speedup, and weights-to-device mapping cost reduction.
arXiv Detail & Related papers (2024-05-08T20:53:38Z) - CAMEL: Co-Designing AI Models and Embedded DRAMs for Efficient On-Device
Learning [8.339901980070616]
Training AI on resource-limited devices poses significant challenges due to the demanding computing workload and the substantial memory consumption and data access required by deep neural networks (DNNs)
We propose utilizing embedded dynamic random-access memory (eDRAM) as the primary storage medium for transient training data.
We present a highly efficient on-device training engine named textitCAMEL, which leverages eDRAM as the primary on-chip memory.
arXiv Detail & Related papers (2023-05-04T20:57:01Z) - Multi-Complexity-Loss DNAS for Energy-Efficient and Memory-Constrained
Deep Neural Networks [22.40937602825472]
Energy and memory are rarely considered simultaneously, in particular by low-search-cost Differentiable (DNAS) solutions.
We propose the first DNAS that directly addresses the most realistic scenario from a designer's perspective.
Our networks span a range of 2.18x in energy consumption and 4.04% in accuracy for the same memory constraint, and reduce energy by up to 2.2x with negligible accuracy drop with respect to the baseline.
arXiv Detail & Related papers (2022-06-01T08:04:50Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - CREW: Computation Reuse and Efficient Weight Storage for
Hardware-accelerated MLPs and RNNs [1.0635248457021496]
We present CREW, a hardware accelerator that implements Reuse and an Efficient Weight Storage mechanism.
CREW greatly reduces the number of multiplications and provides significant savings in model memory footprint and memory bandwidth usage.
On average, CREW provides 2.61x speedup and 2.42x energy savings over a TPU-like accelerator.
arXiv Detail & Related papers (2021-07-20T11:10:54Z) - Dynamic Slimmable Network [105.74546828182834]
We develop a dynamic network slimming regime named Dynamic Slimmable Network (DS-Net)
Our DS-Net is empowered with the ability of dynamic inference by the proposed double-headed dynamic gate.
It consistently outperforms its static counterparts as well as state-of-the-art static and dynamic model compression methods.
arXiv Detail & Related papers (2021-03-24T15:25:20Z) - Improving Computational Efficiency in Visual Reinforcement Learning via
Stored Embeddings [89.63764845984076]
We present Stored Embeddings for Efficient Reinforcement Learning (SEER)
SEER is a simple modification of existing off-policy deep reinforcement learning methods.
We show that SEER does not degrade the performance of RLizable agents while significantly saving computation and memory.
arXiv Detail & Related papers (2021-03-04T08:14:10Z) - Bit Error Robustness for Energy-Efficient DNN Accelerators [93.58572811484022]
We show that a combination of robust fixed-point quantization, weight clipping, and random bit error training (RandBET) improves robustness against random bit errors.
This leads to high energy savings from both low-voltage operation as well as low-precision quantization.
arXiv Detail & Related papers (2020-06-24T18:23:10Z) - SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost
Computation [97.78417228445883]
We present SmartExchange, an algorithm- hardware co-design framework for energy-efficient inference of deep neural networks (DNNs)
We develop a novel algorithm to enforce a specially favorable DNN weight structure, where each layerwise weight matrix can be stored as the product of a small basis matrix and a large sparse coefficient matrix whose non-zero elements are all power-of-2.
We further design a dedicated accelerator to fully utilize the SmartExchange-enforced weights to improve both energy efficiency and latency performance.
arXiv Detail & Related papers (2020-05-07T12:12:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.