Memory-Immersed Collaborative Digitization for Area-Efficient
Compute-in-Memory Deep Learning
- URL: http://arxiv.org/abs/2307.03863v1
- Date: Fri, 7 Jul 2023 23:33:22 GMT
- Title: Memory-Immersed Collaborative Digitization for Area-Efficient
Compute-in-Memory Deep Learning
- Authors: Shamma Nasrin, Maeesha Binte Hashem, Nastaran Darabi, Benjamin
Parpillon, Farah Fahim, Wilfred Gomes, and Amit Ranjan Trivedi
- Abstract summary: This work discusses memory-immersed collaborative digitization among compute-in-memory (CiM) arrays to minimize the area overheads of a conventional analog-to-digital converter (ADC) for deep learning inference.
Under the digitization scheme, CiM arrays exploit their parasitic bit lines to form a within-memory capacitive digital-to-analog converter (DAC) that facilitates area-efficient successive approximation (SA) digitization.
- Score: 2.9812721676061127
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work discusses memory-immersed collaborative digitization among
compute-in-memory (CiM) arrays to minimize the area overheads of a conventional
analog-to-digital converter (ADC) for deep learning inference. Thereby, using
the proposed scheme, significantly more CiM arrays can be accommodated within
limited footprint designs to improve parallelism and minimize external memory
accesses. Under the digitization scheme, CiM arrays exploit their parasitic bit
lines to form a within-memory capacitive digital-to-analog converter (DAC) that
facilitates area-efficient successive approximation (SA) digitization. CiM
arrays collaborate where a proximal array digitizes the analog-domain
product-sums when an array computes the scalar product of input and weights. We
discuss various networking configurations among CiM arrays where Flash, SA, and
their hybrid digitization steps can be efficiently implemented using the
proposed memory-immersed scheme. The results are demonstrated using a 65 nm
CMOS test chip. Compared to a 40 nm-node 5-bit SAR ADC, our 65 nm design
requires $\sim$25$\times$ less area and $\sim$1.4$\times$ less energy by
leveraging in-memory computing structures. Compared to a 40 nm-node 5-bit Flash
ADC, our design requires $\sim$51$\times$ less area and $\sim$13$\times$ less
energy.
Related papers
- Dynamic neural network with memristive CIM and CAM for 2D and 3D vision [57.6208980140268]
We propose a semantic memory-based dynamic neural network (DNN) using memristor.
The network associates incoming data with the past experience stored as semantic vectors.
We validate our co-designs, using a 40nm memristor macro, on ResNet and PointNet++ for classifying images and 3D points from the MNIST and ModelNet datasets.
arXiv Detail & Related papers (2024-07-12T04:55:57Z) - Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges.
We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs.
This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z) - ModSRAM: Algorithm-Hardware Co-Design for Large Number Modular Multiplication in SRAM [7.949839381468341]
Elliptic curve cryptography (ECC) is widely used in security applications such as public key cryptography (CPK) and zero-knowledge proofs (ZKP)
arXiv Detail & Related papers (2024-02-21T22:26:44Z) - DDC-PIM: Efficient Algorithm/Architecture Co-design for Doubling Data
Capacity of SRAM-based Processing-In-Memory [6.367916611208411]
We propose DDC-PIM, an efficient algorithm/architecture co-design methodology that effectively doubles the equivalent data capacity.
DDC-PIM yields about $2.84times$ speedup on MobileNetV2 and $2.69times$ on EfficientNet-B0 with negligible accuracy loss.
Compared with state-of-the-art macros, DDC-PIM achieves up to $8.41times$ and $2.75times$ improvement in weight density and area efficiency, respectively.
arXiv Detail & Related papers (2023-10-31T12:49:54Z) - MCUFormer: Deploying Vision Transformers on Microcontrollers with
Limited Memory [76.02294791513552]
We propose a hardware-algorithm co-optimizations method called MCUFormer to deploy vision transformers on microcontrollers with extremely limited memory.
Experimental results demonstrate that our MCUFormer achieves 73.62% top-1 accuracy on ImageNet for image classification with 320KB memory.
arXiv Detail & Related papers (2023-10-25T18:00:26Z) - Containing Analog Data Deluge at Edge through Frequency-Domain
Compression in Collaborative Compute-in-Memory Networks [0.0]
This paper proposes a novel solution to improve area efficiency in deep learning inference tasks.
By processing analog data more efficiently, it is possible to selectively retain valuable data from sensors and alleviate the challenges posed by the analog data deluge.
arXiv Detail & Related papers (2023-09-20T03:52:04Z) - A 65nm 8b-Activation 8b-Weight SRAM-Based Charge-Domain Computing-in-Memory Macro Using A Fully-Parallel Analog Adder Network and A Single-ADC Interface [16.228299091691873]
Computing-in-memory (CiM) is a promising mitigation approach by enabling multiply-accumulate operations within the memory.
This work achieves 51.2GOPS throughput and 10.3TOPS/W energy efficiency, while showing 88.6% accuracy in the CIFAR-10 dataset.
arXiv Detail & Related papers (2022-11-23T07:52:10Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - AnalogNets: ML-HW Co-Design of Noise-robust TinyML Models and Always-On
Analog Compute-in-Memory Accelerator [50.31646817567764]
This work describes TinyML models for the popular always-on applications of keyword spotting (KWS) and visual wake words (VWW)
We detail a comprehensive training methodology, to retain accuracy in the face of analog non-idealities.
We also describe AON-CiM, a programmable, minimal-area phase-change memory (PCM) analog CiM accelerator.
arXiv Detail & Related papers (2021-11-10T10:24:46Z) - Neural Network Compression for Noisy Storage Devices [71.4102472611862]
Conventionally, model compression and physical storage are decoupled.
This approach forces the storage to treat each bit of the compressed model equally, and to dedicate the same amount of resources to each bit.
We propose a radically different approach that: (i) employs analog memories to maximize the capacity of each memory cell, and (ii) jointly optimize model compression and physical storage to maximize memory utility.
arXiv Detail & Related papers (2021-02-15T18:19:07Z) - IMAC: In-memory multi-bit Multiplication andACcumulation in 6T SRAM
Array [5.29958909018578]
In-memory computing aims at embedding some aspects of computations inside the memory array.
We present a novel in-memory multiplication followed by accumulation operation capable of performing parallel dot products within 6T array.
The proposed system is 6.24x better in energy consumption and 9.42x better in delay.
arXiv Detail & Related papers (2020-03-27T17:43:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.