Related papers: A Charge Domain P-8T SRAM Compute-In-Memory with Low-Cost DAC/ADC Operation for 4-bit Input Processing

A Charge Domain P-8T SRAM Compute-In-Memory with Low-Cost DAC/ADC Operation for 4-bit Input Processing

URL: http://arxiv.org/abs/2211.16008v1
Date: Tue, 29 Nov 2022 08:15:27 GMT
Title: A Charge Domain P-8T SRAM Compute-In-Memory with Low-Cost DAC/ADC Operation for 4-bit Input Processing
Authors: Joonhyung Kim, Kyeongho Lee and Jongsun Park
Abstract summary: This paper presents a low cost PMOS-based 8T (P-8T) Compute-In-Memory (CIM) architecture. It efficiently per-forms the multiply-accumulate (MAC) operations between 4-bit input activations and 8-bit weights. The 256X80 P-8T CIM macro implementation using 28nm CMOS process shows the accuracies of 91.46% and 66.67%.
Score: 4.054285623919103
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents a low cost PMOS-based 8T (P-8T) SRAM Compute-In-Memory (CIM) architecture that efficiently per-forms the multiply-accumulate (MAC) operations between 4-bit input activations and 8-bit weights. First, bit-line (BL) charge-sharing technique is employed to design the low-cost and reliable digital-to-analog conversion of 4-bit input activations in the pro-posed SRAM CIM, where the charge domain analog computing provides variation tolerant and linear MAC outputs. The 16 local arrays are also effectively exploited to implement the analog mul-tiplication unit (AMU) that simultaneously produces 16 multipli-cation results between 4-bit input activations and 1-bit weights. For the hardware cost reduction of analog-to-digital converter (ADC) without sacrificing DNN accuracy, hardware aware sys-tem simulations are performed to decide the ADC bit-resolutions and the number of activated rows in the proposed CIM macro. In addition, for the ADC operation, the AMU-based reference col-umns are utilized for generating ADC reference voltages, with which low-cost 4-bit coarse-fine flash ADC has been designed. The 256X80 P-8T SRAM CIM macro implementation using 28nm CMOS process shows that the proposed CIM shows the accuracies of 91.46% and 66.67% with CIFAR-10 and CIFAR-100 dataset, respectively, with the energy efficiency of 50.07-TOPS/W.

Related papers

IMAGINE: An 8-to-1b 22nm FD-SOI Compute-In-Memory CNN Accelerator With an End-to-End Analog Charge-Based 0.15-8POPS/W Macro Featuring Distribution-Aware Data Reshaping [0.6071203743728119]
We present IMAGINE, a workload-adaptive 1-to-8b CIM-CNN accelerator in 22nm FD-SOI. It introduces a 1152x256 end-to-end charge-based macro with a multi-bit DP based on an input-serial, weight-parallel accumulation that avoids power-hungry DACs. Measurement results showcase an 8b system-level energy efficiency of 40TOPS/W at 0.3/0.6V, with competitive accuracies on MNIST and CIFAR-10.
arXiv Detail & Related papers (2024-12-27T17:18:15Z)
BitNet a4.8: 4-bit Activations for 1-bit LLMs [95.73339037243105]
We introduce BitNet a4.8, enabling 4-bit activations for 1-bit Large Language Models. We demonstrate that BitNet a4.8 achieves performance comparable to BitNet b1.58 with equivalent training costs.
arXiv Detail & Related papers (2024-11-07T18:41:50Z)
Progressive Mixed-Precision Decoding for Efficient LLM Inference [49.05448842542558]
We introduce Progressive Mixed-Precision Decoding (PMPD) to address the memory-boundedness of decoding. PMPD achieves 1.4$-$12.2$times$ speedup in matrix-vector multiplications over fp16 models. Our approach delivers a throughput gain of 3.8$-$8.0$times$ over fp16 models and up to 1.54$times$ over uniform quantization approaches.
arXiv Detail & Related papers (2024-10-17T11:46:33Z)
A Pipelined Memristive Neural Network Analog-to-Digital Converter [0.24578723416255754]
This paper proposes a scalable and modular neural network ADC architecture based on a pipeline of four-bit converters. An 8-bit pipelined ADC achieves 0.18 LSB INL, 0.20 LSB DNL, 7.6 ENOB, and 0.97 fJ/conv FOM.
arXiv Detail & Related papers (2024-06-04T10:51:12Z)
A 137.5 TOPS/W SRAM Compute-in-Memory Macro with 9-b Memory Cell-Embedded ADCs and Signal Margin Enhancement Techniques for AI Edge Applications [20.74979295607707]
CIM macro can perform 4x4-bit MAC operations and yield 9-bit signed output. Innocent discharge branches of cells are utilized to apply time-modulated MAC and 9-bit ADC readout operations.
arXiv Detail & Related papers (2023-07-12T06:20:19Z)
DeepGEMM: Accelerated Ultra Low-Precision Inference on CPU Architectures using Lookup Tables [49.965024476651706]
DeepGEMM is a lookup table based approach for the execution of ultra low-precision convolutional neural networks on SIMD hardware. Our implementation outperforms corresponding 8-bit integer kernels by up to 1.74x on x86 platforms.
arXiv Detail & Related papers (2023-04-18T15:13:10Z)
A 65nm 8b-Activation 8b-Weight SRAM-Based Charge-Domain Computing-in-Memory Macro Using A Fully-Parallel Analog Adder Network and A Single-ADC Interface [16.228299091691873]
Computing-in-memory (CiM) is a promising mitigation approach by enabling multiply-accumulate operations within the memory. This work achieves 51.2GOPS throughput and 10.3TOPS/W energy efficiency, while showing 88.6% accuracy in the CIFAR-10 dataset.
arXiv Detail & Related papers (2022-11-23T07:52:10Z)
AnalogNets: ML-HW Co-Design of Noise-robust TinyML Models and Always-On Analog Compute-in-Memory Accelerator [50.31646817567764]
This work describes TinyML models for the popular always-on applications of keyword spotting (KWS) and visual wake words (VWW) We detail a comprehensive training methodology, to retain accuracy in the face of analog non-idealities. We also describe AON-CiM, a programmable, minimal-area phase-change memory (PCM) analog CiM accelerator.
arXiv Detail & Related papers (2021-11-10T10:24:46Z)
CAP-RAM: A Charge-Domain In-Memory Computing 6T-SRAM for Accurate and Precision-Programmable CNN Inference [27.376343943107788]
CAP-RAM is a compact, accurate, and bitwidth-programmable in-memory computing (IMC) static random-access memory (SRAM) macro. It is presented for energy-efficient convolutional neural network (CNN) inference. A 65-nm prototype validates the excellent linearity and computing accuracy of CAP-RAM.
arXiv Detail & Related papers (2021-07-06T04:59:16Z)
HAWQV3: Dyadic Neural Network Quantization [73.11579145354801]
Current low-precision quantization algorithms often have the hidden cost of conversion back and forth from floating point to quantized integer values. We present HAWQV3, a novel mixed-precision integer-only quantization framework.
arXiv Detail & Related papers (2020-11-20T23:51:43Z)
Massive MIMO As an Extreme Learning Machine [83.12538841141892]
A massive multiple-input multiple-output (MIMO) system with low-resolution analog-to-digital converters (ADCs) forms a natural extreme learning machine (ELM) By adding random biases to the received signals and optimizing the ELM output weights, the system can effectively tackle hardware impairments.
arXiv Detail & Related papers (2020-07-01T04:15:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.