Learning in Log-Domain: Subthreshold Analog AI Accelerator Based on Stochastic Gradient Descent
- URL: http://arxiv.org/abs/2501.13181v1
- Date: Wed, 22 Jan 2025 19:26:36 GMT
- Title: Learning in Log-Domain: Subthreshold Analog AI Accelerator Based on Stochastic Gradient Descent
- Authors: Momen K Tageldeen, Yacine Belgaid, Vivek Mohan, Zhou Wang, Emmanuel M Drakakis,
- Abstract summary: We propose a novel analog accelerator architecture for AI/ML training workloads using gradient descent with L2 regularization (SGDr)<n>The proposed design achieves significant reductions in transistor area and power consumption compared to digital implementations.<n>This work paves the way for energy-efficient analog AI hardware with on-chip training capabilities.
- Score: 5.429033337081392
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid proliferation of AI models, coupled with growing demand for edge deployment, necessitates the development of AI hardware that is both high-performance and energy-efficient. In this paper, we propose a novel analog accelerator architecture designed for AI/ML training workloads using stochastic gradient descent with L2 regularization (SGDr). The architecture leverages log-domain circuits in subthreshold MOS and incorporates volatile memory. We establish a mathematical framework for solving SGDr in the continuous time domain and detail the mapping of SGDr learning equations to log-domain circuits. By operating in the analog domain and utilizing weak inversion, the proposed design achieves significant reductions in transistor area and power consumption compared to digital implementations. Experimental results demonstrate that the architecture closely approximates ideal behavior, with a mean square error below 0.87% and precision as low as 8 bits. Furthermore, the architecture supports a wide range of hyperparameters. This work paves the way for energy-efficient analog AI hardware with on-chip training capabilities.
Related papers
- Dense Associative Memories with Analog Circuits [4.0086293309536405]
We propose a general method for building analog accelerators for DenseAMs.<n>We find that analog DenseAM hardware performs inference in constant time independent of model size.<n>We estimate lower bounds on the achievable time constants imposed by amplifier specifications, suggesting that even conservative existing analog technology can enable inference times on the order of tens to hundreds of nanoseconds.
arXiv Detail & Related papers (2025-12-17T01:22:44Z) - WARP-LUTs - Walsh-Assisted Relaxation for Probabilistic Look Up Tables [0.0]
Walsh-Assisted Relaxation for Probabilistic Look-Up Tables (WARP-LUTs)<n>We introduce WARP-LUTs - a novel gradient-based method that efficiently learns combinations of logic gates with substantially fewer trainable parameters.<n>We demonstrate that WARP-LUTs achieve significantly faster convergence on CIFAR-10 compared to DLGNs, while maintaining comparable accuracy.
arXiv Detail & Related papers (2025-10-17T13:44:36Z) - In-memory Training on Analog Devices with Limited Conductance States via Multi-tile Residual Learning [59.091567092071564]
In-memory training typically requires at least 8-bit conductance states to match digital baselines.<n>Many promising memristive devices such as ReRAM offer only about 4-bit resolution due to fabrication constraints.<n>This paper proposes a emphresidual learning framework that sequentially learns on multiple crossbar tiles to compensate the residual errors.
arXiv Detail & Related papers (2025-10-02T19:44:25Z) - Large-Scale Model Enabled Semantic Communication Based on Robust Knowledge Distillation [53.16213723669751]
Large-scale models (LSMs) can be an effective framework for semantic representation and understanding.<n>However, their direct deployment is often hindered by high computational complexity and resource requirements.<n>This paper proposes a novel knowledge distillation based semantic communication framework.
arXiv Detail & Related papers (2025-08-04T07:47:18Z) - Dynamic Acoustic Model Architecture Optimization in Training for ASR [51.21112094223223]
DMAO is an architecture optimization framework that employs a grow-and-drop strategy to automatically reallocate parameters during training.<n>We evaluate DMAO through experiments with CTC onSpeech, TED-LIUM-v2 and Switchboard datasets.
arXiv Detail & Related papers (2025-06-16T07:47:34Z) - Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition [95.54406667705999]
Pangu Embedded is an efficient Large Language Model (LLM) reasoner developed on Ascend Neural Processing Units (NPUs)<n>It addresses the significant computational costs and inference latency challenges prevalent in existing reasoning-optimized LLMs.<n>It delivers rapid responses and state-of-the-art reasoning quality within a single, unified model architecture.
arXiv Detail & Related papers (2025-05-28T14:03:02Z) - Automatic Generation of Fast and Accurate Performance Models for Deep Neural Network Accelerators [33.18173790144853]
We present an automated generation approach for fast performance models to accurately estimate the latency of a Deep Neural Networks (DNNs)
We modeled representative DNN accelerators such as Gemmini, UltraTrail, Plasticine-derived, and a parameterizable systolic array.
We evaluate only 154 loop kernel iterations to estimate the performance for 4.19 billion instructions achieving a significant speedup.
arXiv Detail & Related papers (2024-09-13T07:27:55Z) - Inference Optimization of Foundation Models on AI Accelerators [68.24450520773688]
Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI.
As the number of model parameters reaches to hundreds of billions, their deployment incurs prohibitive inference costs and high latency in real-world scenarios.
This tutorial offers a comprehensive discussion on complementary inference optimization techniques using AI accelerators.
arXiv Detail & Related papers (2024-07-12T09:24:34Z) - Towards Exact Gradient-based Training on Analog In-memory Computing [28.38387901763604]
Inference on analog accelerators has been studied recently, but the training perspective is underexplored.
Recent studies have shown that the "workhorse" of digital AI training - gradient descent (SGD) algorithm converges inexactly when applied to model training on non-ideal devices.
This paper puts forth a theoretical foundation for gradient-based training on analog devices.
arXiv Detail & Related papers (2024-06-18T16:43:59Z) - TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - Mechanistic Design and Scaling of Hybrid Architectures [114.3129802943915]
We identify and test new hybrid architectures constructed from a variety of computational primitives.
We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis.
We find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures.
arXiv Detail & Related papers (2024-03-26T16:33:12Z) - Random resistive memory-based deep extreme point learning machine for
unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM)
Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z) - AnalogNAS: A Neural Network Design Framework for Accurate Inference with
Analog In-Memory Computing [7.596833322764203]
Inference at the edge requires low latency, compact and power-efficient models.
analog/mixed signal in-memory computing hardware accelerators can easily transcend the memory wall of von Neuman architectures.
We propose AnalogNAS, a framework for automated Deep Neural Network (DNN) design targeting deployment on analog In-Memory Computing (IMC) inference accelerators.
arXiv Detail & Related papers (2023-05-17T07:39:14Z) - Biologically Plausible Learning on Neuromorphic Hardware Architectures [27.138481022472]
Neuromorphic computing is an emerging paradigm that confronts this imbalance by computations directly in analog memories.
This work is the first to compare the impact of different learning algorithms on Compute-In-Memory-based hardware and vice versa.
arXiv Detail & Related papers (2022-12-29T15:10:59Z) - Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of
Peripherals [11.31429464715989]
This paper presents a new PIM architecture to efficiently accelerate deep learning tasks.
It is proposed to minimize the required A/D conversions with analog accumulation and neural approximated peripheral circuits.
Evaluations on different benchmarks demonstrate that Neural-PIM can improve energy efficiency by 5.36x (1.73x) and speed up throughput by 3.43x (1.59x) without losing accuracy.
arXiv Detail & Related papers (2022-01-30T16:14:49Z) - GradInit: Learning to Initialize Neural Networks for Stable and
Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks.
It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value.
It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z) - One-step regression and classification with crosspoint resistive memory
arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge.
One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition.
Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.