Related papers: Approximate ADCs for In-Memory Computing

Approximate ADCs for In-Memory Computing

URL: http://arxiv.org/abs/2408.06390v1
Date: Sun, 11 Aug 2024 05:59:59 GMT
Title: Approximate ADCs for In-Memory Computing
Authors: Arkapravo Ghosh, Hemkar Reddy Sadana, Mukut Debnath, Panthadip Maji, Shubham Negi, Sumeet Gupta, Mrigank Sharad, Kaushik Roy,
Abstract summary: In memory computing (IMC) architectures for deep learning (DL) accelerators leverage energy-efficient and highly parallel matrix vector multiplication (MVM) operations. Recently reported designs reveal that the ADCs required for reading out the MVM results, consume more than 85% of the total compute power and also dominate the area. In this work we present peripheral aware design of IMC cores, to mitigate such overheads.
Score: 5.1793930906065775
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: In memory computing (IMC) architectures for deep learning (DL) accelerators leverage energy-efficient and highly parallel matrix vector multiplication (MVM) operations, implemented directly in memory arrays. Such IMC designs have been explored based on CMOS as well as emerging non-volatile memory (NVM) technologies like RRAM. IMC architectures generally involve a large number of cores consisting of memory arrays, storing the trained weights of the DL model. Peripheral units like DACs and ADCs are also used for applying inputs and reading out the output values. Recently reported designs reveal that the ADCs required for reading out the MVM results, consume more than 85% of the total compute power and also dominate the area, thereby eschewing the benefits of the IMC scheme. Mitigation of imperfections in the ADCs, namely, non-linearity and variations, incur significant design overheads, due to dedicated calibration units. In this work we present peripheral aware design of IMC cores, to mitigate such overheads. It involves incorporating the non-idealities of ADCs in the training of the DL models, along with that of the memory units. The proposed approach applies equally well to both current mode as well as charge mode MVM operations demonstrated in recent years., and can significantly simplify the design of mixed-signal IMC units.

Related papers

DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs [70.91804882618243]
This paper proposes DSMoE, a novel approach that achieves sparsification by partitioning pre-trained FFN layers into computational blocks. We implement adaptive expert routing using sigmoid activation and straight-through estimators, enabling tokens to flexibly access different aspects of model knowledge. Experiments on LLaMA models demonstrate that under equivalent computational constraints, DSMoE achieves superior performance compared to existing pruning and MoE approaches.
arXiv Detail & Related papers (2025-02-18T02:37:26Z)
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution [114.61347672265076]
Development of MLLMs for real-world robots is challenging due to the typically limited computation and memory capacities available on robotic platforms. We propose a Dynamic Early-Exit Framework for Robotic Vision-Language-Action Model (DeeR) that automatically adjusts the size of the activated MLLM. DeeR demonstrates significant reductions in computational costs of LLM by 5.2-6.5x and GPU memory of LLM by 2-6x without compromising performance.
arXiv Detail & Related papers (2024-11-04T18:26:08Z)
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design [59.00758127310582]
We propose a novel framework Read-ME that transforms pre-trained dense LLMs into smaller MoE models. Our approach employs activation sparsity to extract experts. Read-ME outperforms other popular open-source dense models of similar scales.
arXiv Detail & Related papers (2024-10-24T19:48:51Z)
ARTEMIS: A Mixed Analog-Stochastic In-DRAM Accelerator for Transformer Neural Networks [2.9699290794642366]
ARTEMIS is a mixed analog-stochastic in-DRAM accelerator for transformer models. Our analysis indicates that ARTEMIS exhibits at least 3.0x speedup, 1.8x lower energy, and 1.9x better energy efficiency compared to GPU, TPU, CPU, and state-of-the-art PIM transformer hardware accelerators.
arXiv Detail & Related papers (2024-07-17T15:08:14Z)
Efficient and accurate neural field reconstruction using resistive memory [52.68088466453264]
Traditional signal reconstruction methods on digital computers face both software and hardware challenges. We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs. This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
arXiv Detail & Related papers (2024-04-15T09:33:09Z)
Parameter-Adaptive Approximate MPC: Tuning Neural-Network Controllers without Retraining [50.00291020618743]
This work introduces a novel, parameter-adaptive AMPC architecture capable of online tuning without recomputing large datasets and retraining. We showcase the effectiveness of parameter-adaptive AMPC by controlling the swing-ups of two different real cartpole systems with a severely resource-constrained microcontroller (MCU) Taken together, these contributions represent a marked step toward the practical application of AMPC in real-world systems.
arXiv Detail & Related papers (2024-04-08T20:02:19Z)
RACE-IT: A Reconfigurable Analog CAM-Crossbar Engine for In-Memory Transformer Acceleration [21.196696191478885]
Transformer models represent the cutting edge of Deep Neural Networks (DNNs) processing these models demands significant computational resources and results in a substantial memory footprint. We introduce a novel Analog Content Addressable Memory (ACAM) structure capable of performing various non-MVM operations within Transformers.
arXiv Detail & Related papers (2023-11-29T22:45:39Z)
Hardware/Software co-design with ADC-Less In-memory Computing Hardware for Spiking Neural Networks [4.7519630770389405]
Spiking Neural Networks (SNNs) are bio-plausible models that hold great potential for realizing energy-efficient implementations of sequential tasks on resource-constrained edge devices. We propose a hardware/software co-design methodology to deploy SNNs into an ADC-Less IMC architecture using sense-amplifiers as 1-bit ADCs replacing conventional HP-ADCs. Our proposed framework incurs minimal accuracy degradation by performing hardware-aware training and is able to scale beyond simple image classification tasks to more complex sequential regression tasks.
arXiv Detail & Related papers (2022-11-03T22:37:49Z)
Reliability-Aware Deployment of DNNs on In-Memory Analog Computing Architectures [0.0]
In-Memory Analog Computing (IMAC) circuits remove the need for signal converters by realizing both MVM and NLV operations in the analog domain. We introduce a practical approach to deploy large matrices in deep neural networks (DNNs) onto multiple smaller IMAC subarrays to alleviate the impacts of noise and parasitics.
arXiv Detail & Related papers (2022-10-02T01:43:35Z)
Mitigating Out-of-Distribution Data Density Overestimation in Energy-Based Models [54.06799491319278]
Deep energy-based models (EBMs) are receiving increasing attention due to their ability to learn complex distributions. To train deep EBMs, the maximum likelihood estimation (MLE) with short-run Langevin Monte Carlo (LMC) is often used. We investigate why the MLE with short-run LMC can converge to EBMs with wrong density estimates.
arXiv Detail & Related papers (2022-05-30T02:49:17Z)
Coarse-to-Fine Embedded PatchMatch and Multi-Scale Dynamic Aggregation for Reference-based Super-Resolution [48.093500219958834]
We propose an Accelerated Multi-Scale Aggregation network (AMSA) for Reference-based Super-Resolution. The proposed AMSA achieves superior performance over state-of-the-art approaches on both quantitative and qualitative evaluations.
arXiv Detail & Related papers (2022-01-12T08:40:23Z)
An In-Memory Analog Computing Co-Processor for Energy-Efficient CNN Inference on Mobile Devices [4.117012092777604]
We develop an in-memory analog computing (IMAC) architecture realizing both synaptic behavior and activation functions within non-volatile memory arrays. Spin-orbit torque magnetoresistive random-access memory (SOT-MRAM) devices are leveraged to realize sigmoidal neurons as well as binarized synapses. A heterogeneous mixed-signal and mixed-precision CPU-IMAC architecture is proposed for convolutional neural networks (CNNs) inference on mobile processors.
arXiv Detail & Related papers (2021-05-24T23:01:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.