Towards Memory-Efficient Neural Networks via Multi-Level in situ
Generation
- URL: http://arxiv.org/abs/2108.11430v1
- Date: Wed, 25 Aug 2021 18:50:24 GMT
- Title: Towards Memory-Efficient Neural Networks via Multi-Level in situ
Generation
- Authors: Jiaqi Gu, Hanqing Zhu, Chenghao Feng, Mingjie Liu, Zixuan Jiang, Ray
T. Chen, David Z. Pan
- Abstract summary: Deep neural networks (DNN) have shown superior performance in a variety of tasks.
As they rapidly evolve, their escalating computation and memory demands make it challenging to deploy them on resource-constrained edge devices.
We propose a general and unified framework to trade expensive memory transactions with ultra-fast on-chip computations.
- Score: 10.563649948220371
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNN) have shown superior performance in a variety of
tasks. As they rapidly evolve, their escalating computation and memory demands
make it challenging to deploy them on resource-constrained edge devices. Though
extensive efficient accelerator designs, from traditional electronics to
emerging photonics, have been successfully demonstrated, they are still
bottlenecked by expensive memory accesses due to tremendous gaps between the
bandwidth/power/latency of electrical memory and computing cores. Previous
solutions fail to fully-leverage the ultra-fast computational speed of emerging
DNN accelerators to break through the critical memory bound. In this work, we
propose a general and unified framework to trade expensive memory transactions
with ultra-fast on-chip computations, directly translating to performance
improvement. We are the first to jointly explore the intrinsic correlations and
bit-level redundancy within DNN kernels and propose a multi-level in situ
generation mechanism with mixed-precision bases to achieve on-the-fly recovery
of high-resolution parameters with minimum hardware overhead. Extensive
experiments demonstrate that our proposed joint method can boost the memory
efficiency by 10-20x with comparable accuracy over four state-of-the-art
designs, when benchmarked on ResNet-18/DenseNet-121/MobileNetV2/V3 with various
tasks.
Related papers
- OPIMA: Optical Processing-In-Memory for Convolutional Neural Network Acceleration [5.0389804644646174]
We introduce OPIMA, a processing-in-memory (PIM)-based machine learning accelerator.
PIM struggles to achieve high throughput and energy efficiency due to internal data movement bottlenecks.
We show that OPIMA can achieve 2.98x higher throughput and 137x better energy efficiency than the best-known prior work.
arXiv Detail & Related papers (2024-07-11T06:12:04Z) - Full-Stack Optimization for CAM-Only DNN Inference [2.0837295518447934]
This paper explores the combination of algorithmic optimizations for ternary weight neural networks and associative processors.
We propose a novel compilation flow to optimize convolutions on APs by reducing their arithmetic intensity.
Our solution improves the energy efficiency of ResNet-18 inference on ImageNet by 7.5x compared to crossbar in-memory accelerators.
arXiv Detail & Related papers (2024-01-23T10:27:38Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - FireFly: A High-Throughput Hardware Accelerator for Spiking Neural
Networks with Efficient DSP and Memory Optimization [6.966706170499345]
Spiking neural networks (SNNs) have been widely used due to their strong biological interpretability and high energy efficiency.
Most SNN hardware implementations for field-programmable gate arrays (FPGAs) cannot meet arithmetic or memory efficiency requirements.
We propose an FPGA accelerator that can process spikes generated by the firing neuron on-the-fly (FireFly)
arXiv Detail & Related papers (2023-01-05T04:28:07Z) - BiFSMN: Binary Neural Network for Keyword Spotting [47.46397208920726]
BiFSMN is an accurate and extreme-efficient binary neural network for KWS.
We show that BiFSMN can achieve an impressive 22.3x speedup and 15.5x storage-saving on real-world edge hardware.
arXiv Detail & Related papers (2022-02-14T05:16:53Z) - FPGA-optimized Hardware acceleration for Spiking Neural Networks [69.49429223251178]
This work presents the development of a hardware accelerator for an SNN, with off-line training, applied to an image recognition task.
The design targets a Xilinx Artix-7 FPGA, using in total around the 40% of the available hardware resources.
It reduces the classification time by three orders of magnitude, with a small 4.5% impact on the accuracy, if compared to its software, full precision counterpart.
arXiv Detail & Related papers (2022-01-18T13:59:22Z) - A Heterogeneous In-Memory Computing Cluster For Flexible End-to-End
Inference of Real-World Deep Neural Networks [12.361842554233558]
Deployment of modern TinyML tasks on small battery-constrained IoT devices requires high computational energy efficiency.
Analog In-Memory Computing (IMC) using non-volatile memory (NVM) promises major efficiency improvements in deep neural network (DNN) inference.
We present a heterogeneous tightly-coupled architecture integrating 8 RISC-V cores, an in-memory computing accelerator (IMA), and digital accelerators.
arXiv Detail & Related papers (2022-01-04T11:12:01Z) - Resistive Neural Hardware Accelerators [0.46198289193451136]
ReRAM-based in-memory computing has great potential in the implementation of area and power efficient inference.
The shift towards ReRAM-based in-memory computing has great potential in the implementation of area and power efficient inference.
In this survey, we review the state-of-the-art ReRAM-based Deep Neural Networks (DNNs) many-core accelerators.
arXiv Detail & Related papers (2021-09-08T21:11:48Z) - SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and
Training [82.35376405568975]
Deep neural networks (DNNs) come with heavy parameterization, leading to external dynamic random-access memory (DRAM) for storage.
We present SmartDeal (SD), an algorithm framework to trade higher-cost memory storage/access for lower-cost computation.
We show that SD leads to 10.56x and 4.48x reduction in the storage and training energy, with negligible accuracy loss compared to state-of-the-art training baselines.
arXiv Detail & Related papers (2021-01-04T18:54:07Z) - EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware
Multi-Task NLP Inference [82.1584439276834]
Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks.
We present EdgeBERT, an in-depth algorithm- hardware co-design for latency-aware energy optimization for multi-task NLP.
arXiv Detail & Related papers (2020-11-28T19:21:47Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.