IMBUE: In-Memory Boolean-to-CUrrent Inference ArchitecturE for Tsetlin
Machines
- URL: http://arxiv.org/abs/2305.12914v1
- Date: Mon, 22 May 2023 10:55:01 GMT
- Title: IMBUE: In-Memory Boolean-to-CUrrent Inference ArchitecturE for Tsetlin
Machines
- Authors: Omar Ghazal, Simranjeet Singh, Tousif Rahman, Shengqi Yu, Yujin Zheng,
Domenico Balsamo, Sachin Patkar, Farhad Merchant, Fei Xia, Alex Yakovlev,
Rishad Shafik
- Abstract summary: In-memory computing for Machine Learning (ML) applications remedies the von Neumann bottlenecks by organizing computation to exploit parallelism and locality.
Non-volatile memory devices such as Resistive RAM (ReRAM) offer integrated switching and storage capabilities showing promising performance for ML applications.
This paper proposes an In-Memory Boolean-to-Current Inference Architecture (IMBUE) that uses ReRAM-transistor cells to eliminate the need for such conversions.
- Score: 5.6634493664726495
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In-memory computing for Machine Learning (ML) applications remedies the von
Neumann bottlenecks by organizing computation to exploit parallelism and
locality. Non-volatile memory devices such as Resistive RAM (ReRAM) offer
integrated switching and storage capabilities showing promising performance for
ML applications. However, ReRAM devices have design challenges, such as
non-linear digital-analog conversion and circuit overheads. This paper proposes
an In-Memory Boolean-to-Current Inference Architecture (IMBUE) that uses
ReRAM-transistor cells to eliminate the need for such conversions. IMBUE
processes Boolean feature inputs expressed as digital voltages and generates
parallel current paths based on resistive memory states. The proportional
column current is then translated back to the Boolean domain for further
digital processing. The IMBUE architecture is inspired by the Tsetlin Machine
(TM), an emerging ML algorithm based on intrinsically Boolean logic. The IMBUE
architecture demonstrates significant performance improvements over binarized
convolutional neural networks and digital TM in-memory implementations,
achieving up to a 12.99x and 5.28x increase, respectively.
Related papers
- Architectural Exploration of Application-Specific Resonant SRAM Compute-in-Memory (rCiM) [1.0687104237121408]
This paper presents an automation tool designed to optimize the energy and latency of designs incorporating diverse implementation strategies.
The tool reduces 80.9% of energy consumption on average across all benchmarks.
arXiv Detail & Related papers (2024-11-14T16:01:05Z) - B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory [91.81390121042192]
We develop a class of models called B'MOJO to seamlessly combine eidetic and fading memory within an composable module.
B'MOJO's ability to modulate eidetic and fading memory results in better inference on longer sequences tested up to 32K tokens.
arXiv Detail & Related papers (2024-07-08T18:41:01Z) - Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference [2.9302211589186244]
Large language models (LLMs) have transformed natural language processing, enabling machines to generate human-like text and engage in meaningful conversations.
Developments in computing and memory capabilities are lagging behind, exacerbated by the discontinuation of Moore's law.
compute-in-memory (CIM) technologies offer a promising solution for accelerating AI inference by directly performing analog computations in memory.
arXiv Detail & Related papers (2024-06-12T16:57:58Z) - Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges.
We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs.
This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z) - Resistive Memory-based Neural Differential Equation Solver for Score-based Diffusion Model [55.116403765330084]
Current AIGC methods, such as score-based diffusion, are still deficient in terms of rapidity and efficiency.
We propose a time-continuous and analog in-memory neural differential equation solver for score-based diffusion.
We experimentally validate our solution with 180 nm resistive memory in-memory computing macros.
arXiv Detail & Related papers (2024-04-08T16:34:35Z) - Pruning random resistive memory for optimizing analogue AI [54.21621702814583]
AI models present unprecedented challenges to energy consumption and environmental sustainability.
One promising solution is to revisit analogue computing, a technique that predates digital computing.
Here, we report a universal solution, software-hardware co-design using structural plasticity-inspired edge pruning.
arXiv Detail & Related papers (2023-11-13T08:59:01Z) - Blockwise Parallel Transformer for Large Context Models [70.97386897478238]
Blockwise Parallel Transformer (BPT) is a blockwise computation of self-attention and feedforward network fusion to minimize memory costs.
By processing longer input sequences while maintaining memory efficiency, BPT enables training sequences 32 times longer than vanilla Transformers and up to 4 times longer than previous memory-efficient methods.
arXiv Detail & Related papers (2023-05-30T19:25:51Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - Pex: Memory-efficient Microcontroller Deep Learning through Partial
Execution [11.336229510791481]
We discuss a novel execution paradigm for microcontroller deep learning.
It modifies the execution of neural networks to avoid materialising full buffers in memory.
This is achieved by exploiting the properties of operators, which can consume/produce a fraction of their input/output at a time.
arXiv Detail & Related papers (2022-11-30T18:47:30Z) - PIM-DRAM:Accelerating Machine Learning Workloads using Processing in
Memory based on DRAM Technology [2.6168147530506958]
We propose a processing-in-memory (PIM) multiplication primitive to accelerate matrix vector operations in ML workloads.
We show that the proposed architecture, mapping, and data flow can provide up to 23x and 6.5x benefits over a GPU.
arXiv Detail & Related papers (2021-05-08T16:39:24Z) - In-memory Implementation of On-chip Trainable and Scalable ANN for AI/ML
Applications [0.0]
This paper presents an in-memory computing architecture for ANN enabling artificial intelligence (AI) and machine learning (ML) applications.
Our novel on-chip training and inference in-memory architecture reduces energy cost and enhances throughput by simultaneously accessing the multiple rows of array per precharge cycle.
The proposed architecture was trained and tested on the IRIS dataset which exhibits $46times$ more energy efficient per MAC (multiply and accumulate) operation compared to earlier classifiers.
arXiv Detail & Related papers (2020-05-19T15:36:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.