TL-nvSRAM-CIM: Ultra-High-Density Three-Level ReRAM-Assisted
Computing-in-nvSRAM with DC-Power Free Restore and Ternary MAC Operations
- URL: http://arxiv.org/abs/2307.02717v2
- Date: Thu, 11 Jan 2024 07:18:52 GMT
- Title: TL-nvSRAM-CIM: Ultra-High-Density Three-Level ReRAM-Assisted
Computing-in-nvSRAM with DC-Power Free Restore and Ternary MAC Operations
- Authors: Dengfeng Wang, Liukai Xu, Songyuan Liu, Zhi Li, Yiming Chen, Weifeng
He, Xueqing Li and Yanan Sun
- Abstract summary: This work proposes an ultra-high-density three-level ReRAMs-assisted computing scheme for large NN models.
The proposed TL-nvSRAM-CIM achieves 7.8x higher storage density, compared with the state-art works.
- Score: 8.669532093397065
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accommodating all the weights on-chip for large-scale NNs remains a great
challenge for SRAM based computing-in-memory (SRAM-CIM) with limited on-chip
capacity. Previous non-volatile SRAM-CIM (nvSRAM-CIM) addresses this issue by
integrating high-density single-level ReRAMs on the top of high-efficiency
SRAM-CIM for weight storage to eliminate the off-chip memory access. However,
previous SL-nvSRAM-CIM suffers from poor scalability for an increased number of
SL-ReRAMs and limited computing efficiency. To overcome these challenges, this
work proposes an ultra-high-density three-level ReRAMs-assisted
computing-in-nonvolatile-SRAM (TL-nvSRAM-CIM) scheme for large NN models. The
clustered n-selector-n-ReRAM (cluster-nSnRs) is employed for reliable
weight-restore with eliminated DC power. Furthermore, a ternary SRAM-CIM
mechanism with differential computing scheme is proposed for energy-efficient
ternary MAC operations while preserving high NN accuracy. The proposed
TL-nvSRAM-CIM achieves 7.8x higher storage density, compared with the
state-of-art works. Moreover, TL-nvSRAM-CIM shows up to 2.9x and 1.9x enhanced
energy-efficiency, respectively, compared to the baseline designs of SRAM-CIM
and ReRAM-CIM, respectively.
Related papers
- LiVOS: Light Video Object Segmentation with Gated Linear Matching [116.58237547253935]
LiVOS is a lightweight memory network that employs linear matching via linear attention.
For longer and higher-resolution videos, it matched STM-based methods with 53% less GPU memory and supports 4096p inference on a 32G consumer-grade GPU.
arXiv Detail & Related papers (2024-11-05T05:36:17Z) - Expanding Sparse Tuning for Low Memory Usage [103.43560327427647]
We propose a method named SNELL (Sparse tuning with kerNELized LoRA) for sparse tuning with low memory usage.
To achieve low memory usage, SNELL decomposes the tunable matrix for sparsification into two learnable low-rank matrices.
A competition-based sparsification mechanism is further proposed to avoid the storage of tunable weight indexes.
arXiv Detail & Related papers (2024-11-04T04:58:20Z) - Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution [49.902047563260496]
We develop the first attempt to integrate the Vision State Space Model (Mamba) for remote sensing image (RSI) super-resolution.
To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR.
Our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM)
arXiv Detail & Related papers (2024-05-08T11:09:24Z) - Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges.
We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs.
This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z) - Multi-level, Forming Free, Bulk Switching Trilayer RRAM for Neuromorphic
Computing at the Edge [0.0]
We develop a forming-free and bulk switching RRAM technology based on a trilayer metal-oxide stack.
We develop a neuromorphic compute-in-memory platform based on trilayer bulk RRAM crossbars.
Our work paves the way for neuromorphic computing at the edge under strict size, weight, and power constraints.
arXiv Detail & Related papers (2023-10-20T22:37:46Z) - Evaluation of STT-MRAM as a Scratchpad for Training in ML Accelerators [9.877596714655096]
Training deep neural networks (DNNs) is an extremely memory-intensive process.
Spin-Transfer-Torque MRAM (STT-MRAM) offers several desirable properties for training accelerators.
We show that MRAM provide up to 15-22x improvement in system level energy.
arXiv Detail & Related papers (2023-08-03T20:36:48Z) - NEON: Enabling Efficient Support for Nonlinear Operations in Resistive
RAM-based Neural Network Accelerators [12.045126404373868]
Resistive Random-Access Memory (RRAM) is well-suited to accelerate neural network (NN) workloads.
NEON is a novel compiler optimization to enable the end-to-end execution of the NN workload in RRAM.
arXiv Detail & Related papers (2022-11-10T17:57:35Z) - Efficient Deep Learning Using Non-Volatile Memory Technology [12.866655564742889]
We present DeepNVM++, a comprehensive framework to characterize, model, and analyze NVM-based caches in architectures for deep learning (DL) applications.
In the iso-capacity case, STT-MRAM and SOT-MRAM provide up to 3.8x and 4.7x energy-delay product (EDP) reduction and 2.4x and 2.8x area reduction compared to conventional cache.
DeepNVM++ is demonstrated on STT-/SOT-MRAM technologies and can be used for the characterization, modeling, and analysis of any NVM technology for last-level caches in
arXiv Detail & Related papers (2022-06-27T19:27:57Z) - SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and
Training [82.35376405568975]
Deep neural networks (DNNs) come with heavy parameterization, leading to external dynamic random-access memory (DRAM) for storage.
We present SmartDeal (SD), an algorithm framework to trade higher-cost memory storage/access for lower-cost computation.
We show that SD leads to 10.56x and 4.48x reduction in the storage and training energy, with negligible accuracy loss compared to state-of-the-art training baselines.
arXiv Detail & Related papers (2021-01-04T18:54:07Z) - DeepNVM++: Cross-Layer Modeling and Optimization Framework of
Non-Volatile Memories for Deep Learning [11.228806840123084]
Non-volatile memory (NVM) technologies such as spin-transfer torque magnetic random access memory (STT-MRAM) and spin-orbit torque magnetic random access memory (SOT-MRAM) have significant advantages compared to conventional technologies.
In this work we present DeepNVM++, a framework to characterize, model, and analyze NVM-based caches in deep learning (DL) applications.
arXiv Detail & Related papers (2020-12-08T16:53:25Z) - PAMS: Quantized Super-Resolution via Parameterized Max Scale [84.55675222525608]
Deep convolutional neural networks (DCNNs) have shown dominant performance in the task of super-resolution (SR)
We propose a new quantization scheme termed PArameterized Max Scale (PAMS), which applies the trainable truncated parameter to explore the upper bound of the quantization range adaptively.
Experiments demonstrate that the proposed PAMS scheme can well compress and accelerate the existing SR models such as EDSR and RDN.
arXiv Detail & Related papers (2020-11-09T06:16:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.