RACE-IT: A Reconfigurable Analog CAM-Crossbar Engine for In-Memory
Transformer Acceleration
- URL: http://arxiv.org/abs/2312.06532v1
- Date: Wed, 29 Nov 2023 22:45:39 GMT
- Title: RACE-IT: A Reconfigurable Analog CAM-Crossbar Engine for In-Memory
Transformer Acceleration
- Authors: Lei Zhao, Luca Buonanno, Ron M. Roth, Sergey Serebryakov, Archit
Gajjar, John Moon, Jim Ignowski, Giacomo Pedretti
- Abstract summary: Transformer models represent the cutting edge of Deep Neural Networks (DNNs)
processing these models demands significant computational resources and results in a substantial memory footprint.
We introduce a novel Analog Content Addressable Memory (ACAM) structure capable of performing various non-MVM operations within Transformers.
- Score: 21.196696191478885
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer models represent the cutting edge of Deep Neural Networks (DNNs)
and excel in a wide range of machine learning tasks. However, processing these
models demands significant computational resources and results in a substantial
memory footprint. While In-memory Computing (IMC) offers promise for
accelerating Matrix-Vector Multiplications (MVMs) with high computational
parallelism and minimal data movement, employing it for implementing other
crucial operators within DNNs remains a formidable task. This challenge is
exacerbated by the extensive use of Softmax and data-dependent matrix
multiplications within the attention mechanism. Furthermore, existing IMC
designs encounter difficulties in fully harnessing the benefits of analog MVM
acceleration due to the area and energy-intensive nature of Analog-to-Digital
Converters (ADCs). To tackle these challenges, we introduce a novel Compute
Analog Content Addressable Memory (Compute-ACAM) structure capable of
performing various non-MVM operations within Transformers. Together with the
crossbar structure, our proposed RACE-IT accelerator enables efficient
execution of all operations within Transformer models in the analog domain.
Given the flexibility of our proposed Compute-ACAMs to perform arbitrary
operations, RACE-IT exhibits adaptability to diverse non-traditional and future
DNN architectures without necessitating hardware modifications. Leveraging the
capability of Compute-ACAMs to process analog input and produce digital output,
we also replace ADCs, thereby reducing the overall area and energy costs. By
evaluating various Transformer models against state-of-the-art GPUs and
existing IMC accelerators, RACE-IT increases performance by 10.7x and 5.9x, and
reduces energy by 1193x, and 3.9x, respectively
Related papers
- SpiDR: A Reconfigurable Digital Compute-in-Memory Spiking Neural Network Accelerator for Event-based Perception [8.968583287058959]
Spiking Neural Networks (SNNs) offer an efficient method for processing the asynchronous temporal data generated by Dynamic Vision Sensors (DVS)
Existing SNN accelerators suffer from limitations in adaptability to diverse neuron models, bit precisions and network sizes.
We propose a scalable and reconfigurable digital compute-in-memory (CIM) SNN accelerator chipname with a set of key features.
arXiv Detail & Related papers (2024-11-05T06:59:02Z) - Accelerating Error Correction Code Transformers [56.75773430667148]
We introduce a novel acceleration method for transformer-based decoders.
We achieve a 90% compression ratio and reduce arithmetic operation energy consumption by at least 224 times on modern hardware.
arXiv Detail & Related papers (2024-10-08T11:07:55Z) - ARTEMIS: A Mixed Analog-Stochastic In-DRAM Accelerator for Transformer Neural Networks [2.9699290794642366]
ARTEMIS is a mixed analog-stochastic in-DRAM accelerator for transformer models.
Our analysis indicates that ARTEMIS exhibits at least 3.0x speedup, 1.8x lower energy, and 1.9x better energy efficiency compared to GPU, TPU, CPU, and state-of-the-art PIM transformer hardware accelerators.
arXiv Detail & Related papers (2024-07-17T15:08:14Z) - Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges.
We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs.
This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z) - Accelerator-driven Data Arrangement to Minimize Transformers Run-time on
Multi-core Architectures [5.46396577345121]
complexity of transformer models in artificial intelligence expands their computational costs, memory usage, and energy consumption.
We propose a novel memory arrangement strategy, governed by the hardware accelerator's kernel size, which effectively minimizes off-chip data access.
Our approach can achieve up to a 2.8x speed increase when executing inferences employing state-of-the-art transformers.
arXiv Detail & Related papers (2023-12-20T13:01:25Z) - RWKV: Reinventing RNNs for the Transformer Era [54.716108899349614]
We propose a novel model architecture that combines the efficient parallelizable training of transformers with the efficient inference of RNNs.
We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers.
arXiv Detail & Related papers (2023-05-22T13:57:41Z) - Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications.
The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate.
There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z) - Reliability-Aware Deployment of DNNs on In-Memory Analog Computing
Architectures [0.0]
In-Memory Analog Computing (IMAC) circuits remove the need for signal converters by realizing both MVM and NLV operations in the analog domain.
We introduce a practical approach to deploy large matrices in deep neural networks (DNNs) onto multiple smaller IMAC subarrays to alleviate the impacts of noise and parasitics.
arXiv Detail & Related papers (2022-10-02T01:43:35Z) - An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse
Transformers [11.811907838840712]
We propose an algorithm-hardware co-optimized framework to flexibly and efficiently accelerate Transformers by utilizing general N:M sparsity patterns.
We present a flexible and efficient hardware architecture, namely STA, to achieve significant speedup when deploying N:M sparse Transformers.
Experimental results show that compared to other methods, N:M sparse Transformers, generated using IDP, achieves an average of 6.7% improvement on accuracy with high training efficiency.
arXiv Detail & Related papers (2022-08-12T04:51:49Z) - Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution [50.10987776141901]
Recent vision transformers along with self-attention have achieved promising results on various computer vision tasks.
We introduce an effective hybrid architecture for super-resolution (SR) tasks, which leverages local features from CNNs and long-range dependencies captured by transformers.
Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
arXiv Detail & Related papers (2022-03-15T06:52:25Z) - AnalogNets: ML-HW Co-Design of Noise-robust TinyML Models and Always-On
Analog Compute-in-Memory Accelerator [50.31646817567764]
This work describes TinyML models for the popular always-on applications of keyword spotting (KWS) and visual wake words (VWW)
We detail a comprehensive training methodology, to retain accuracy in the face of analog non-idealities.
We also describe AON-CiM, a programmable, minimal-area phase-change memory (PCM) analog CiM accelerator.
arXiv Detail & Related papers (2021-11-10T10:24:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.