A 137.5 TOPS/W SRAM Compute-in-Memory Macro with 9-b Memory
Cell-Embedded ADCs and Signal Margin Enhancement Techniques for AI Edge
Applications
- URL: http://arxiv.org/abs/2307.05944v3
- Date: Wed, 19 Jul 2023 08:58:58 GMT
- Title: A 137.5 TOPS/W SRAM Compute-in-Memory Macro with 9-b Memory
Cell-Embedded ADCs and Signal Margin Enhancement Techniques for AI Edge
Applications
- Authors: Xiaomeng Wang, Fengshi Tian, Xizi Chen, Jiakun Zheng, Xuejiao Liu,
Fengbin Tu, Jie Yang, Mohamad Sawan, Kwang-Ting Cheng, Chi-Ying Tsui
- Abstract summary: CIM macro can perform 4x4-bit MAC operations and yield 9-bit signed output.
Innocent discharge branches of cells are utilized to apply time-modulated MAC and 9-bit ADC readout operations.
- Score: 20.74979295607707
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a high-precision SRAM-based CIM macro that can
perform 4x4-bit MAC operations and yield 9-bit signed output. The inherent
discharge branches of SRAM cells are utilized to apply time-modulated MAC and
9-bit ADC readout operations on two bit-line capacitors. The same principle is
used for both MAC and A-to-D conversion ensuring high linearity and thus
supporting large number of analog MAC accumulations. The memory cell-embedded
ADC eliminates the use of separate ADCs and enhances energy and area
efficiency. Additionally, two signal margin enhancement techniques, namely the
MAC-folding and boosted-clipping schemes, are proposed to further improve the
CIM computation accuracy.
Related papers
- LiVOS: Light Video Object Segmentation with Gated Linear Matching [116.58237547253935]
LiVOS is a lightweight memory network that employs linear matching via linear attention.
For longer and higher-resolution videos, it matched STM-based methods with 53% less GPU memory and supports 4096p inference on a 32G consumer-grade GPU.
arXiv Detail & Related papers (2024-11-05T05:36:17Z) - Progressive Mixed-Precision Decoding for Efficient LLM Inference [49.05448842542558]
We introduce Progressive Mixed-Precision Decoding (PMPD) to address the memory-boundedness of decoding.
PMPD achieves 1.4$-$12.2$times$ speedup in matrix-vector multiplications over fp16 models.
Our approach delivers a throughput gain of 3.8$-$8.0$times$ over fp16 models and up to 1.54$times$ over uniform quantization approaches.
arXiv Detail & Related papers (2024-10-17T11:46:33Z) - Containing Analog Data Deluge at Edge through Frequency-Domain
Compression in Collaborative Compute-in-Memory Networks [0.0]
This paper proposes a novel solution to improve area efficiency in deep learning inference tasks.
By processing analog data more efficiently, it is possible to selectively retain valuable data from sensors and alleviate the challenges posed by the analog data deluge.
arXiv Detail & Related papers (2023-09-20T03:52:04Z) - A Charge Domain P-8T SRAM Compute-In-Memory with Low-Cost DAC/ADC
Operation for 4-bit Input Processing [4.054285623919103]
This paper presents a low cost PMOS-based 8T (P-8T) Compute-In-Memory (CIM) architecture.
It efficiently per-forms the multiply-accumulate (MAC) operations between 4-bit input activations and 8-bit weights.
The 256X80 P-8T CIM macro implementation using 28nm CMOS process shows the accuracies of 91.46% and 66.67%.
arXiv Detail & Related papers (2022-11-29T08:15:27Z) - A 65nm 8b-Activation 8b-Weight SRAM-Based Charge-Domain Computing-in-Memory Macro Using A Fully-Parallel Analog Adder Network and A Single-ADC Interface [16.228299091691873]
Computing-in-memory (CiM) is a promising mitigation approach by enabling multiply-accumulate operations within the memory.
This work achieves 51.2GOPS throughput and 10.3TOPS/W energy efficiency, while showing 88.6% accuracy in the CIFAR-10 dataset.
arXiv Detail & Related papers (2022-11-23T07:52:10Z) - Extending Compositional Attention Networks for Social Reasoning in
Videos [84.12658971655253]
We propose a novel deep architecture for the task of reasoning about social interactions in videos.
We leverage the multi-step reasoning capabilities of Compositional Attention Networks (MAC), and propose a multimodal extension (MAC-X)
arXiv Detail & Related papers (2022-10-03T19:03:01Z) - MAC-DO: An Efficient Output-Stationary GEMM Accelerator for CNNs Using
DRAM Technology [2.918940961856197]
This paper presents MAC-DO, an efficient and low-power DRAM-based in-situ accelerator.
It supports a multi-bit multiply-accumulate (MAC) operation within a single cycle.
A MAC-DO array efficiently can accelerate matrix multiplications based on output stationary mapping, supporting the majority of computations performed in deep neural networks (DNNs)
arXiv Detail & Related papers (2022-07-16T07:33:20Z) - AnalogNets: ML-HW Co-Design of Noise-robust TinyML Models and Always-On
Analog Compute-in-Memory Accelerator [50.31646817567764]
This work describes TinyML models for the popular always-on applications of keyword spotting (KWS) and visual wake words (VWW)
We detail a comprehensive training methodology, to retain accuracy in the face of analog non-idealities.
We also describe AON-CiM, a programmable, minimal-area phase-change memory (PCM) analog CiM accelerator.
arXiv Detail & Related papers (2021-11-10T10:24:46Z) - CAP-RAM: A Charge-Domain In-Memory Computing 6T-SRAM for Accurate and
Precision-Programmable CNN Inference [27.376343943107788]
CAP-RAM is a compact, accurate, and bitwidth-programmable in-memory computing (IMC) static random-access memory (SRAM) macro.
It is presented for energy-efficient convolutional neural network (CNN) inference.
A 65-nm prototype validates the excellent linearity and computing accuracy of CAP-RAM.
arXiv Detail & Related papers (2021-07-06T04:59:16Z) - EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware
Multi-Task NLP Inference [82.1584439276834]
Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks.
We present EdgeBERT, an in-depth algorithm- hardware co-design for latency-aware energy optimization for multi-task NLP.
arXiv Detail & Related papers (2020-11-28T19:21:47Z) - Massive MIMO As an Extreme Learning Machine [83.12538841141892]
A massive multiple-input multiple-output (MIMO) system with low-resolution analog-to-digital converters (ADCs) forms a natural extreme learning machine (ELM)
By adding random biases to the received signals and optimizing the ELM output weights, the system can effectively tackle hardware impairments.
arXiv Detail & Related papers (2020-07-01T04:15:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.