CAP-RAM: A Charge-Domain In-Memory Computing 6T-SRAM for Accurate and
Precision-Programmable CNN Inference
- URL: http://arxiv.org/abs/2107.02388v1
- Date: Tue, 6 Jul 2021 04:59:16 GMT
- Title: CAP-RAM: A Charge-Domain In-Memory Computing 6T-SRAM for Accurate and
Precision-Programmable CNN Inference
- Authors: Zhiyu Chen, Zhanghao Yu, Qing Jin, Yan He, Jingyu Wang, Sheng Lin, Dai
Li, Yanzhi Wang, Kaiyuan Yang
- Abstract summary: CAP-RAM is a compact, accurate, and bitwidth-programmable in-memory computing (IMC) static random-access memory (SRAM) macro.
It is presented for energy-efficient convolutional neural network (CNN) inference.
A 65-nm prototype validates the excellent linearity and computing accuracy of CAP-RAM.
- Score: 27.376343943107788
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A compact, accurate, and bitwidth-programmable in-memory computing (IMC)
static random-access memory (SRAM) macro, named CAP-RAM, is presented for
energy-efficient convolutional neural network (CNN) inference. It leverages a
novel charge-domain multiply-and-accumulate (MAC) mechanism and circuitry to
achieve superior linearity under process variations compared to conventional
IMC designs. The adopted semi-parallel architecture efficiently stores filters
from multiple CNN layers by sharing eight standard 6T SRAM cells with one
charge-domain MAC circuit. Moreover, up to six levels of bit-width of weights
with two encoding schemes and eight levels of input activations are supported.
A 7-bit charge-injection SAR (ciSAR) analog-to-digital converter (ADC) getting
rid of sample and hold (S&H) and input/reference buffers further improves the
overall energy efficiency and throughput. A 65-nm prototype validates the
excellent linearity and computing accuracy of CAP-RAM. A single 512x128 macro
stores a complete pruned and quantized CNN model to achieve 98.8% inference
accuracy on the MNIST data set and 89.0% on the CIFAR-10 data set, with a
573.4-giga operations per second (GOPS) peak throughput and a 49.4-tera
operations per second (TOPS)/W energy efficiency.
Related papers
- Pruning random resistive memory for optimizing analogue AI [54.21621702814583]
AI models present unprecedented challenges to energy consumption and environmental sustainability.
One promising solution is to revisit analogue computing, a technique that predates digital computing.
Here, we report a universal solution, software-hardware co-design using structural plasticity-inspired edge pruning.
arXiv Detail & Related papers (2023-11-13T08:59:01Z) - Containing Analog Data Deluge at Edge through Frequency-Domain
Compression in Collaborative Compute-in-Memory Networks [0.0]
This paper proposes a novel solution to improve area efficiency in deep learning inference tasks.
By processing analog data more efficiently, it is possible to selectively retain valuable data from sensors and alleviate the challenges posed by the analog data deluge.
arXiv Detail & Related papers (2023-09-20T03:52:04Z) - A 137.5 TOPS/W SRAM Compute-in-Memory Macro with 9-b Memory
Cell-Embedded ADCs and Signal Margin Enhancement Techniques for AI Edge
Applications [20.74979295607707]
CIM macro can perform 4x4-bit MAC operations and yield 9-bit signed output.
Innocent discharge branches of cells are utilized to apply time-modulated MAC and 9-bit ADC readout operations.
arXiv Detail & Related papers (2023-07-12T06:20:19Z) - DeepGEMM: Accelerated Ultra Low-Precision Inference on CPU Architectures
using Lookup Tables [49.965024476651706]
DeepGEMM is a lookup table based approach for the execution of ultra low-precision convolutional neural networks on SIMD hardware.
Our implementation outperforms corresponding 8-bit integer kernels by up to 1.74x on x86 platforms.
arXiv Detail & Related papers (2023-04-18T15:13:10Z) - Quantized Neural Networks for Low-Precision Accumulation with Guaranteed
Overflow Avoidance [68.8204255655161]
We introduce a quantization-aware training algorithm that guarantees avoiding numerical overflow when reducing the precision of accumulators during inference.
We evaluate our algorithm across multiple quantized models that we train for different tasks, showing that our approach can reduce the precision of accumulators while maintaining model accuracy with respect to a floating-point baseline.
arXiv Detail & Related papers (2023-01-31T02:46:57Z) - A Charge Domain P-8T SRAM Compute-In-Memory with Low-Cost DAC/ADC
Operation for 4-bit Input Processing [4.054285623919103]
This paper presents a low cost PMOS-based 8T (P-8T) Compute-In-Memory (CIM) architecture.
It efficiently per-forms the multiply-accumulate (MAC) operations between 4-bit input activations and 8-bit weights.
The 256X80 P-8T CIM macro implementation using 28nm CMOS process shows the accuracies of 91.46% and 66.67%.
arXiv Detail & Related papers (2022-11-29T08:15:27Z) - RAMP: A Flat Nanosecond Optical Network and MPI Operations for
Distributed Deep Learning Systems [68.8204255655161]
We introduce a near-exascale, full-bisection bandwidth, all-to-all, single-hop, all-optical network architecture with nanosecond reconfiguration called RAMP.
RAMP supports large-scale distributed and parallel computing systems (12.8Tbps per node for up to 65,536 nodes.
arXiv Detail & Related papers (2022-11-28T11:24:51Z) - A 65nm 8b-Activation 8b-Weight SRAM-Based Charge-Domain Computing-in-Memory Macro Using A Fully-Parallel Analog Adder Network and A Single-ADC Interface [16.228299091691873]
Computing-in-memory (CiM) is a promising mitigation approach by enabling multiply-accumulate operations within the memory.
This work achieves 51.2GOPS throughput and 10.3TOPS/W energy efficiency, while showing 88.6% accuracy in the CIFAR-10 dataset.
arXiv Detail & Related papers (2022-11-23T07:52:10Z) - AnalogNets: ML-HW Co-Design of Noise-robust TinyML Models and Always-On
Analog Compute-in-Memory Accelerator [50.31646817567764]
This work describes TinyML models for the popular always-on applications of keyword spotting (KWS) and visual wake words (VWW)
We detail a comprehensive training methodology, to retain accuracy in the face of analog non-idealities.
We also describe AON-CiM, a programmable, minimal-area phase-change memory (PCM) analog CiM accelerator.
arXiv Detail & Related papers (2021-11-10T10:24:46Z) - Leveraging Automated Mixed-Low-Precision Quantization for tiny edge
microcontrollers [76.30674794049293]
This paper presents an automated mixed-precision quantization flow based on the HAQ framework but tailored for the memory and computational characteristics of MCU devices.
Specifically, a Reinforcement Learning agent searches for the best uniform quantization levels, among 2, 4, 8 bits, of individual weight and activation tensors.
Given an MCU-class memory bound to 2MB for weight-only quantization, the compressed models produced by the mixed-precision engine result as accurate as the state-of-the-art solutions.
arXiv Detail & Related papers (2020-08-12T06:09:58Z) - Q-EEGNet: an Energy-Efficient 8-bit Quantized Parallel EEGNet
Implementation for Edge Motor-Imagery Brain--Machine Interfaces [16.381467082472515]
Motor-Imagery Brain--Machine Interfaces (MI-BMIs)promise direct and accessible communication between human brains and machines.
Deep learning models have emerged for classifying EEG signals.
These models often exceed the limitations of edge devices due to their memory and computational requirements.
arXiv Detail & Related papers (2020-04-24T12:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.