HiMA: A Fast and Scalable History-based Memory Access Engine for
Differentiable Neural Computer
- URL: http://arxiv.org/abs/2202.07275v1
- Date: Tue, 15 Feb 2022 09:35:14 GMT
- Title: HiMA: A Fast and Scalable History-based Memory Access Engine for
Differentiable Neural Computer
- Authors: Yaoyu Tao, Zhengya Zhang
- Abstract summary: We present HiMA, a tiled, history-based memory access engine with distributed memories in tiles.
HiMA incorporates a multi-mode network-on-chip (NoC) to reduce the communication latency and improve scalability.
By simulations, HiMA running DNC and DNC-D demonstrates 6.47x and 39.1x higher speed, 22.8x and 164.3x better area efficiency, and 6.1x and 61.2x better energy efficiency.
- Score: 0.20305676256390928
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Memory-augmented neural networks (MANNs) provide better inference performance
in many tasks with the help of an external memory. The recently developed
differentiable neural computer (DNC) is a MANN that has been shown to
outperform in representing complicated data structures and learning long-term
dependencies. DNC's higher performance is derived from new history-based
attention mechanisms in addition to the previously used content-based attention
mechanisms. History-based mechanisms require a variety of new compute
primitives and state memories, which are not supported by existing neural
network (NN) or MANN accelerators. We present HiMA, a tiled, history-based
memory access engine with distributed memories in tiles. HiMA incorporates a
multi-mode network-on-chip (NoC) to reduce the communication latency and
improve scalability. An optimal submatrix-wise memory partition strategy is
applied to reduce the amount of NoC traffic; and a two-stage usage sort method
leverages distributed tiles to improve computation speed. To make HiMA
fundamentally scalable, we create a distributed version of DNC called DNC-D to
allow almost all memory operations to be applied to local memories with
trainable weighted summation to produce the global memory output. Two
approximation techniques, usage skimming and softmax approximation, are
proposed to further enhance hardware efficiency. HiMA prototypes are created in
RTL and synthesized in a 40nm technology. By simulations, HiMA running DNC and
DNC-D demonstrates 6.47x and 39.1x higher speed, 22.8x and 164.3x better area
efficiency, and 6.1x and 61.2x better energy efficiency over the
state-of-the-art MANN accelerator. Compared to an Nvidia 3080Ti GPU, HiMA
demonstrates speedup by up to 437x and 2,646x when running DNC and DNC-D,
respectively.
Related papers
- Dynamic neural network with memristive CIM and CAM for 2D and 3D vision [57.6208980140268]
We propose a semantic memory-based dynamic neural network (DNN) using memristor.
The network associates incoming data with the past experience stored as semantic vectors.
We validate our co-designs, using a 40nm memristor macro, on ResNet and PointNet++ for classifying images and 3D points from the MNIST and ModelNet datasets.
arXiv Detail & Related papers (2024-07-12T04:55:57Z) - OPIMA: Optical Processing-In-Memory for Convolutional Neural Network Acceleration [5.0389804644646174]
We introduce OPIMA, a processing-in-memory (PIM)-based machine learning accelerator.
PIM struggles to achieve high throughput and energy efficiency due to internal data movement bottlenecks.
We show that OPIMA can achieve 2.98x higher throughput and 137x better energy efficiency than the best-known prior work.
arXiv Detail & Related papers (2024-07-11T06:12:04Z) - Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges.
We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs.
This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z) - Resistive Memory-based Neural Differential Equation Solver for Score-based Diffusion Model [55.116403765330084]
Current AIGC methods, such as score-based diffusion, are still deficient in terms of rapidity and efficiency.
We propose a time-continuous and analog in-memory neural differential equation solver for score-based diffusion.
We experimentally validate our solution with 180 nm resistive memory in-memory computing macros.
arXiv Detail & Related papers (2024-04-08T16:34:35Z) - Pruning random resistive memory for optimizing analogue AI [54.21621702814583]
AI models present unprecedented challenges to energy consumption and environmental sustainability.
One promising solution is to revisit analogue computing, a technique that predates digital computing.
Here, we report a universal solution, software-hardware co-design using structural plasticity-inspired edge pruning.
arXiv Detail & Related papers (2023-11-13T08:59:01Z) - DAISM: Digital Approximate In-SRAM Multiplier-based Accelerator for DNN
Training and Inference [4.718504401468233]
PIM solutions rely either on novel memory technologies that have yet to mature or bit-serial computations that have significant performance overhead and scalability issues.
Our work proposes an in-SRAM digital multiplier, that uses a conventional memory to perform bit-parallel computations, leveraging multiple wordlines activation.
We then introduce DAISM, an architecture leveraging this multiplier, which achieves up to two orders of magnitude higher area efficiency compared to the SOTA counterparts, with competitive energy efficiency.
arXiv Detail & Related papers (2023-05-12T10:58:21Z) - Boosting Mobile CNN Inference through Semantic Memory [12.45440733435801]
We develop a semantic memory design to improve on-device CNN inference.
SMTM employs a hierarchical memory architecture to leverage the long-tail distribution of objects of interest.
It can significantly speed up the model inference over standard approach (up to 2X) and prior cache designs (up to 1.5X), with acceptable accuracy loss.
arXiv Detail & Related papers (2021-12-05T18:18:31Z) - MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning [72.80896338009579]
We find that the memory bottleneck is due to the imbalanced memory distribution in convolutional neural network (CNN) designs.
We propose a generic patch-by-patch inference scheduling, which significantly cuts down the peak memory.
We automate the process with neural architecture search to jointly optimize the neural architecture and inference scheduling, leading to MCUNetV2.
arXiv Detail & Related papers (2021-10-28T17:58:45Z) - Memory-Augmented Deep Unfolding Network for Compressive Sensing [7.123516761504439]
Memory-Augmented Deep Unfolding Network (MADUN) is proposed to map a truncated optimization method into a deep neural network.
We show that our MADUN outperforms existing state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2021-10-19T07:03:12Z) - Efficiency-driven Hardware Optimization for Adversarially Robust Neural
Networks [3.125321230840342]
We will focus on how to address adversarial robustness for Deep Neural Networks (DNNs) through efficiency-driven hardware optimizations.
One such approach is approximate digital CMOS memories with hybrid 6T-8T cells that enable supply scaling (Vdd) yielding low-power operation.
Another memory optimization approach involves the creation of memristive crossbars that perform Matrix-Multiplications (MVMs) efficiently with low energy and area requirements.
arXiv Detail & Related papers (2021-05-09T19:26:25Z) - Optimizing Memory Placement using Evolutionary Graph Reinforcement
Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces.
We train and validate our approach directly on the Intel NNP-I chip for inference.
We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.