A Charge Domain P-8T SRAM Compute-In-Memory with Low-Cost DAC/ADC
Operation for 4-bit Input Processing
- URL: http://arxiv.org/abs/2211.16008v1
- Date: Tue, 29 Nov 2022 08:15:27 GMT
- Title: A Charge Domain P-8T SRAM Compute-In-Memory with Low-Cost DAC/ADC
Operation for 4-bit Input Processing
- Authors: Joonhyung Kim, Kyeongho Lee and Jongsun Park
- Abstract summary: This paper presents a low cost PMOS-based 8T (P-8T) Compute-In-Memory (CIM) architecture.
It efficiently per-forms the multiply-accumulate (MAC) operations between 4-bit input activations and 8-bit weights.
The 256X80 P-8T CIM macro implementation using 28nm CMOS process shows the accuracies of 91.46% and 66.67%.
- Score: 4.054285623919103
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a low cost PMOS-based 8T (P-8T) SRAM Compute-In-Memory
(CIM) architecture that efficiently per-forms the multiply-accumulate (MAC)
operations between 4-bit input activations and 8-bit weights. First, bit-line
(BL) charge-sharing technique is employed to design the low-cost and reliable
digital-to-analog conversion of 4-bit input activations in the pro-posed SRAM
CIM, where the charge domain analog computing provides variation tolerant and
linear MAC outputs. The 16 local arrays are also effectively exploited to
implement the analog mul-tiplication unit (AMU) that simultaneously produces 16
multipli-cation results between 4-bit input activations and 1-bit weights. For
the hardware cost reduction of analog-to-digital converter (ADC) without
sacrificing DNN accuracy, hardware aware sys-tem simulations are performed to
decide the ADC bit-resolutions and the number of activated rows in the proposed
CIM macro. In addition, for the ADC operation, the AMU-based reference col-umns
are utilized for generating ADC reference voltages, with which low-cost 4-bit
coarse-fine flash ADC has been designed. The 256X80 P-8T SRAM CIM macro
implementation using 28nm CMOS process shows that the proposed CIM shows the
accuracies of 91.46% and 66.67% with CIFAR-10 and CIFAR-100 dataset,
respectively, with the energy efficiency of 50.07-TOPS/W.
Related papers
- BitNet a4.8: 4-bit Activations for 1-bit LLMs [95.73339037243105]
We introduce BitNet a4.8, enabling 4-bit activations for 1-bit Large Language Models.
We demonstrate that BitNet a4.8 achieves performance comparable to BitNet b1.58 with equivalent training costs.
arXiv Detail & Related papers (2024-11-07T18:41:50Z) - Progressive Mixed-Precision Decoding for Efficient LLM Inference [49.05448842542558]
We introduce Progressive Mixed-Precision Decoding (PMPD) to address the memory-boundedness of decoding.
PMPD achieves 1.4$-$12.2$times$ speedup in matrix-vector multiplications over fp16 models.
Our approach delivers a throughput gain of 3.8$-$8.0$times$ over fp16 models and up to 1.54$times$ over uniform quantization approaches.
arXiv Detail & Related papers (2024-10-17T11:46:33Z) - A Pipelined Memristive Neural Network Analog-to-Digital Converter [0.24578723416255754]
This paper proposes a scalable and modular neural network ADC architecture based on a pipeline of four-bit converters.
An 8-bit pipelined ADC achieves 0.18 LSB INL, 0.20 LSB DNL, 7.6 ENOB, and 0.97 fJ/conv FOM.
arXiv Detail & Related papers (2024-06-04T10:51:12Z) - A 137.5 TOPS/W SRAM Compute-in-Memory Macro with 9-b Memory
Cell-Embedded ADCs and Signal Margin Enhancement Techniques for AI Edge
Applications [20.74979295607707]
CIM macro can perform 4x4-bit MAC operations and yield 9-bit signed output.
Innocent discharge branches of cells are utilized to apply time-modulated MAC and 9-bit ADC readout operations.
arXiv Detail & Related papers (2023-07-12T06:20:19Z) - DeepGEMM: Accelerated Ultra Low-Precision Inference on CPU Architectures
using Lookup Tables [49.965024476651706]
DeepGEMM is a lookup table based approach for the execution of ultra low-precision convolutional neural networks on SIMD hardware.
Our implementation outperforms corresponding 8-bit integer kernels by up to 1.74x on x86 platforms.
arXiv Detail & Related papers (2023-04-18T15:13:10Z) - A 65nm 8b-Activation 8b-Weight SRAM-Based Charge-Domain Computing-in-Memory Macro Using A Fully-Parallel Analog Adder Network and A Single-ADC Interface [16.228299091691873]
Computing-in-memory (CiM) is a promising mitigation approach by enabling multiply-accumulate operations within the memory.
This work achieves 51.2GOPS throughput and 10.3TOPS/W energy efficiency, while showing 88.6% accuracy in the CIFAR-10 dataset.
arXiv Detail & Related papers (2022-11-23T07:52:10Z) - AnalogNets: ML-HW Co-Design of Noise-robust TinyML Models and Always-On
Analog Compute-in-Memory Accelerator [50.31646817567764]
This work describes TinyML models for the popular always-on applications of keyword spotting (KWS) and visual wake words (VWW)
We detail a comprehensive training methodology, to retain accuracy in the face of analog non-idealities.
We also describe AON-CiM, a programmable, minimal-area phase-change memory (PCM) analog CiM accelerator.
arXiv Detail & Related papers (2021-11-10T10:24:46Z) - CAP-RAM: A Charge-Domain In-Memory Computing 6T-SRAM for Accurate and
Precision-Programmable CNN Inference [27.376343943107788]
CAP-RAM is a compact, accurate, and bitwidth-programmable in-memory computing (IMC) static random-access memory (SRAM) macro.
It is presented for energy-efficient convolutional neural network (CNN) inference.
A 65-nm prototype validates the excellent linearity and computing accuracy of CAP-RAM.
arXiv Detail & Related papers (2021-07-06T04:59:16Z) - HAWQV3: Dyadic Neural Network Quantization [73.11579145354801]
Current low-precision quantization algorithms often have the hidden cost of conversion back and forth from floating point to quantized integer values.
We present HAWQV3, a novel mixed-precision integer-only quantization framework.
arXiv Detail & Related papers (2020-11-20T23:51:43Z) - Massive MIMO As an Extreme Learning Machine [83.12538841141892]
A massive multiple-input multiple-output (MIMO) system with low-resolution analog-to-digital converters (ADCs) forms a natural extreme learning machine (ELM)
By adding random biases to the received signals and optimizing the ELM output weights, the system can effectively tackle hardware impairments.
arXiv Detail & Related papers (2020-07-01T04:15:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.