A Charge Domain P-8T SRAM Compute-In-Memory with Low-Cost DAC/ADC
Operation for 4-bit Input Processing
- URL: http://arxiv.org/abs/2211.16008v1
- Date: Tue, 29 Nov 2022 08:15:27 GMT
- Title: A Charge Domain P-8T SRAM Compute-In-Memory with Low-Cost DAC/ADC
Operation for 4-bit Input Processing
- Authors: Joonhyung Kim, Kyeongho Lee and Jongsun Park
- Abstract summary: This paper presents a low cost PMOS-based 8T (P-8T) Compute-In-Memory (CIM) architecture.
It efficiently per-forms the multiply-accumulate (MAC) operations between 4-bit input activations and 8-bit weights.
The 256X80 P-8T CIM macro implementation using 28nm CMOS process shows the accuracies of 91.46% and 66.67%.
- Score: 4.054285623919103
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a low cost PMOS-based 8T (P-8T) SRAM Compute-In-Memory
(CIM) architecture that efficiently per-forms the multiply-accumulate (MAC)
operations between 4-bit input activations and 8-bit weights. First, bit-line
(BL) charge-sharing technique is employed to design the low-cost and reliable
digital-to-analog conversion of 4-bit input activations in the pro-posed SRAM
CIM, where the charge domain analog computing provides variation tolerant and
linear MAC outputs. The 16 local arrays are also effectively exploited to
implement the analog mul-tiplication unit (AMU) that simultaneously produces 16
multipli-cation results between 4-bit input activations and 1-bit weights. For
the hardware cost reduction of analog-to-digital converter (ADC) without
sacrificing DNN accuracy, hardware aware sys-tem simulations are performed to
decide the ADC bit-resolutions and the number of activated rows in the proposed
CIM macro. In addition, for the ADC operation, the AMU-based reference col-umns
are utilized for generating ADC reference voltages, with which low-cost 4-bit
coarse-fine flash ADC has been designed. The 256X80 P-8T SRAM CIM macro
implementation using 28nm CMOS process shows that the proposed CIM shows the
accuracies of 91.46% and 66.67% with CIFAR-10 and CIFAR-100 dataset,
respectively, with the energy efficiency of 50.07-TOPS/W.
Related papers
- A Pipelined Memristive Neural Network Analog-to-Digital Converter [0.24578723416255754]
This paper proposes a scalable and modular neural network ADC architecture based on a pipeline of four-bit converters.
An 8-bit pipelined ADC achieves 0.18 LSB INL, 0.20 LSB DNL, 7.6 ENOB, and 0.97 fJ/conv FOM.
arXiv Detail & Related papers (2024-06-04T10:51:12Z) - A 137.5 TOPS/W SRAM Compute-in-Memory Macro with 9-b Memory
Cell-Embedded ADCs and Signal Margin Enhancement Techniques for AI Edge
Applications [20.74979295607707]
CIM macro can perform 4x4-bit MAC operations and yield 9-bit signed output.
Innocent discharge branches of cells are utilized to apply time-modulated MAC and 9-bit ADC readout operations.
arXiv Detail & Related papers (2023-07-12T06:20:19Z) - DeepGEMM: Accelerated Ultra Low-Precision Inference on CPU Architectures
using Lookup Tables [49.965024476651706]
DeepGEMM is a lookup table based approach for the execution of ultra low-precision convolutional neural networks on SIMD hardware.
Our implementation outperforms corresponding 8-bit integer kernels by up to 1.74x on x86 platforms.
arXiv Detail & Related papers (2023-04-18T15:13:10Z) - Quantized Neural Networks for Low-Precision Accumulation with Guaranteed
Overflow Avoidance [68.8204255655161]
We introduce a quantization-aware training algorithm that guarantees avoiding numerical overflow when reducing the precision of accumulators during inference.
We evaluate our algorithm across multiple quantized models that we train for different tasks, showing that our approach can reduce the precision of accumulators while maintaining model accuracy with respect to a floating-point baseline.
arXiv Detail & Related papers (2023-01-31T02:46:57Z) - A 65nm 8b-Activation 8b-Weight SRAM-Based Charge-Domain Computing-in-Memory Macro Using A Fully-Parallel Analog Adder Network and A Single-ADC Interface [16.228299091691873]
Computing-in-memory (CiM) is a promising mitigation approach by enabling multiply-accumulate operations within the memory.
This work achieves 51.2GOPS throughput and 10.3TOPS/W energy efficiency, while showing 88.6% accuracy in the CIFAR-10 dataset.
arXiv Detail & Related papers (2022-11-23T07:52:10Z) - FP8 Formats for Deep Learning [49.54015320992368]
We propose an 8-bit floating point (FP8) binary interchange format consisting of two encodings.
E4M3's dynamic range is extended by not representing infinities and having only one mantissa bit-pattern for NaNs.
We demonstrate the efficacy of the FP8 format on a variety of image and language tasks, effectively matching the result quality achieved by 16-bit training sessions.
arXiv Detail & Related papers (2022-09-12T17:39:55Z) - AnalogNets: ML-HW Co-Design of Noise-robust TinyML Models and Always-On
Analog Compute-in-Memory Accelerator [50.31646817567764]
This work describes TinyML models for the popular always-on applications of keyword spotting (KWS) and visual wake words (VWW)
We detail a comprehensive training methodology, to retain accuracy in the face of analog non-idealities.
We also describe AON-CiM, a programmable, minimal-area phase-change memory (PCM) analog CiM accelerator.
arXiv Detail & Related papers (2021-11-10T10:24:46Z) - CAP-RAM: A Charge-Domain In-Memory Computing 6T-SRAM for Accurate and
Precision-Programmable CNN Inference [27.376343943107788]
CAP-RAM is a compact, accurate, and bitwidth-programmable in-memory computing (IMC) static random-access memory (SRAM) macro.
It is presented for energy-efficient convolutional neural network (CNN) inference.
A 65-nm prototype validates the excellent linearity and computing accuracy of CAP-RAM.
arXiv Detail & Related papers (2021-07-06T04:59:16Z) - HAWQV3: Dyadic Neural Network Quantization [73.11579145354801]
Current low-precision quantization algorithms often have the hidden cost of conversion back and forth from floating point to quantized integer values.
We present HAWQV3, a novel mixed-precision integer-only quantization framework.
arXiv Detail & Related papers (2020-11-20T23:51:43Z) - Massive MIMO As an Extreme Learning Machine [83.12538841141892]
A massive multiple-input multiple-output (MIMO) system with low-resolution analog-to-digital converters (ADCs) forms a natural extreme learning machine (ELM)
By adding random biases to the received signals and optimizing the ELM output weights, the system can effectively tackle hardware impairments.
arXiv Detail & Related papers (2020-07-01T04:15:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.