Related papers: Memory-Immersed Collaborative Digitization for Area-Efficient Compute-in-Memory Deep Learning

Memory-Immersed Collaborative Digitization for Area-Efficient Compute-in-Memory Deep Learning

URL: http://arxiv.org/abs/2307.03863v1
Date: Fri, 7 Jul 2023 23:33:22 GMT
Title: Memory-Immersed Collaborative Digitization for Area-Efficient Compute-in-Memory Deep Learning
Authors: Shamma Nasrin, Maeesha Binte Hashem, Nastaran Darabi, Benjamin Parpillon, Farah Fahim, Wilfred Gomes, and Amit Ranjan Trivedi
Abstract summary: This work discusses memory-immersed collaborative digitization among compute-in-memory (CiM) arrays to minimize the area overheads of a conventional analog-to-digital converter (ADC) for deep learning inference. Under the digitization scheme, CiM arrays exploit their parasitic bit lines to form a within-memory capacitive digital-to-analog converter (DAC) that facilitates area-efficient successive approximation (SA) digitization.
Score: 2.9812721676061127
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This work discusses memory-immersed collaborative digitization among compute-in-memory (CiM) arrays to minimize the area overheads of a conventional analog-to-digital converter (ADC) for deep learning inference. Thereby, using the proposed scheme, significantly more CiM arrays can be accommodated within limited footprint designs to improve parallelism and minimize external memory accesses. Under the digitization scheme, CiM arrays exploit their parasitic bit lines to form a within-memory capacitive digital-to-analog converter (DAC) that facilitates area-efficient successive approximation (SA) digitization. CiM arrays collaborate where a proximal array digitizes the analog-domain product-sums when an array computes the scalar product of input and weights. We discuss various networking configurations among CiM arrays where Flash, SA, and their hybrid digitization steps can be efficiently implemented using the proposed memory-immersed scheme. The results are demonstrated using a 65 nm CMOS test chip. Compared to a 40 nm-node 5-bit SAR ADC, our 65 nm design requires $\sim$25$\times$ less area and $\sim$1.4$\times$ less energy by leveraging in-memory computing structures. Compared to a 40 nm-node 5-bit Flash ADC, our design requires $\sim$51$\times$ less area and $\sim$13$\times$ less energy.

Related papers

Hardware-software co-exploration with racetrack memory based in-memory computing for CNN inference in embedded systems [54.045712360156024]
racetrack memory is a non-volatile technology that allows high data density fabrication.<n>In-memory arithmetic circuits with memory cells affects both the memory density and power efficiency.<n>We present an efficient in-memory convolutional neural network (CNN) accelerator optimized for use with racetrack memory.
arXiv Detail & Related papers (2025-07-02T07:29:53Z)
Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers [65.35142508909892]
We present a novel four-dimensional hybrid parallel algorithm implemented in a highly scalable, portable, open-source framework called AxoNN. We demonstrate fine-tuning of a 405-billion parameter LLM using AxoNN on Frontier.
arXiv Detail & Related papers (2025-02-12T06:05:52Z)
Dynamic neural network with memristive CIM and CAM for 2D and 3D vision [57.6208980140268]
We propose a semantic memory-based dynamic neural network (DNN) using memristor. The network associates incoming data with the past experience stored as semantic vectors. We validate our co-designs, using a 40nm memristor macro, on ResNet and PointNet++ for classifying images and 3D points from the MNIST and ModelNet datasets.
arXiv Detail & Related papers (2024-07-12T04:55:57Z)
Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges. We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs. This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z)
ModSRAM: Algorithm-Hardware Co-Design for Large Number Modular Multiplication in SRAM [7.949839381468341]
Elliptic curve cryptography (ECC) is widely used in security applications such as public key cryptography (CPK) and zero-knowledge proofs (ZKP)
arXiv Detail & Related papers (2024-02-21T22:26:44Z)
DDC-PIM: Efficient Algorithm/Architecture Co-design for Doubling Data Capacity of SRAM-based Processing-In-Memory [6.367916611208411]
We propose DDC-PIM, an efficient algorithm/architecture co-design methodology that effectively doubles the equivalent data capacity. DDC-PIM yields about $2.84times$ speedup on MobileNetV2 and $2.69times$ on EfficientNet-B0 with negligible accuracy loss. Compared with state-of-the-art macros, DDC-PIM achieves up to $8.41times$ and $2.75times$ improvement in weight density and area efficiency, respectively.
arXiv Detail & Related papers (2023-10-31T12:49:54Z)
MCUFormer: Deploying Vision Transformers on Microcontrollers with Limited Memory [76.02294791513552]
We propose a hardware-algorithm co-optimizations method called MCUFormer to deploy vision transformers on microcontrollers with extremely limited memory. Experimental results demonstrate that our MCUFormer achieves 73.62% top-1 accuracy on ImageNet for image classification with 320KB memory.
arXiv Detail & Related papers (2023-10-25T18:00:26Z)
Containing Analog Data Deluge at Edge through Frequency-Domain Compression in Collaborative Compute-in-Memory Networks [0.0]
This paper proposes a novel solution to improve area efficiency in deep learning inference tasks. By processing analog data more efficiently, it is possible to selectively retain valuable data from sensors and alleviate the challenges posed by the analog data deluge.
arXiv Detail & Related papers (2023-09-20T03:52:04Z)
A 65nm 8b-Activation 8b-Weight SRAM-Based Charge-Domain Computing-in-Memory Macro Using A Fully-Parallel Analog Adder Network and A Single-ADC Interface [16.228299091691873]
Computing-in-memory (CiM) is a promising mitigation approach by enabling multiply-accumulate operations within the memory. This work achieves 51.2GOPS throughput and 10.3TOPS/W energy efficiency, while showing 88.6% accuracy in the CIFAR-10 dataset.
arXiv Detail & Related papers (2022-11-23T07:52:10Z)
Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching. Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z)
AnalogNets: ML-HW Co-Design of Noise-robust TinyML Models and Always-On Analog Compute-in-Memory Accelerator [50.31646817567764]
This work describes TinyML models for the popular always-on applications of keyword spotting (KWS) and visual wake words (VWW) We detail a comprehensive training methodology, to retain accuracy in the face of analog non-idealities. We also describe AON-CiM, a programmable, minimal-area phase-change memory (PCM) analog CiM accelerator.
arXiv Detail & Related papers (2021-11-10T10:24:46Z)
CAP-RAM: A Charge-Domain In-Memory Computing 6T-SRAM for Accurate and Precision-Programmable CNN Inference [27.376343943107788]
CAP-RAM is a compact, accurate, and bitwidth-programmable in-memory computing (IMC) static random-access memory (SRAM) macro. It is presented for energy-efficient convolutional neural network (CNN) inference. A 65-nm prototype validates the excellent linearity and computing accuracy of CAP-RAM.
arXiv Detail & Related papers (2021-07-06T04:59:16Z)
Neural Network Compression for Noisy Storage Devices [71.4102472611862]
Conventionally, model compression and physical storage are decoupled. This approach forces the storage to treat each bit of the compressed model equally, and to dedicate the same amount of resources to each bit. We propose a radically different approach that: (i) employs analog memories to maximize the capacity of each memory cell, and (ii) jointly optimize model compression and physical storage to maximize memory utility.
arXiv Detail & Related papers (2021-02-15T18:19:07Z)
IMAC: In-memory multi-bit Multiplication andACcumulation in 6T SRAM Array [5.29958909018578]
In-memory computing aims at embedding some aspects of computations inside the memory array. We present a novel in-memory multiplication followed by accumulation operation capable of performing parallel dot products within 6T array. The proposed system is 6.24x better in energy consumption and 9.42x better in delay.
arXiv Detail & Related papers (2020-03-27T17:43:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.