Bias-Scalable Near-Memory CMOS Analog Processor for Machine Learning
- URL: http://arxiv.org/abs/2202.05022v3
- Date: Wed, 4 Jan 2023 08:57:40 GMT
- Title: Bias-Scalable Near-Memory CMOS Analog Processor for Machine Learning
- Authors: Pratik Kumar, Ankita Nandi, Shantanu Chakrabartty, Chetan Singh Thakur
- Abstract summary: Bias-scalable approximate analog computing is attractive for implementing machine learning (ML) processors with distinct power-performance specifications.
We demonstrate the implementation of bias-scalable approximate analog computing circuits using the generalization of the margin-propagation principle.
- Score: 6.548257506132353
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Bias-scalable analog computing is attractive for implementing machine
learning (ML) processors with distinct power-performance specifications. For
instance, ML implementations for server workloads are focused on higher
computational throughput for faster training, whereas ML implementations for
edge devices are focused on energy-efficient inference. In this paper, we
demonstrate the implementation of bias-scalable approximate analog computing
circuits using the generalization of the margin-propagation principle called
shape-based analog computing (S-AC). The resulting S-AC core integrates several
near-memory compute elements, which include: (a) non-linear activation
functions; (b) inner-product compute circuits; and (c) a mixed-signal
compressive memory, all of which can be scaled for performance or power while
preserving its functionality. Using measured results from prototypes fabricated
in a 180nm CMOS process, we demonstrate that the performance of computing
modules remains robust to transistor biasing and variations in temperature. In
this paper, we also demonstrate the effect of bias-scalability and
computational accuracy on a simple ML regression task.
Related papers
- Kernel Approximation using Analog In-Memory Computing [3.5231018007564203]
Kernel functions are vital ingredients of several machine learning algorithms, but often incur significant memory and computational costs.
We introduce an approach to kernel approximation in machine learning algorithms suitable for mixed-signal Analog In-Memory Computing (AIMC) architectures.
arXiv Detail & Related papers (2024-11-05T16:18:47Z) - Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization [0.6445087473595953]
Large language models (LLMs) demonstrate outstanding performance in various tasks in machine learning.
deploying LLM inference poses challenges due to the high compute and memory requirements.
We present Tender, an algorithm-hardware co-design solution that enables efficient deployment of LLM inference at low precision.
arXiv Detail & Related papers (2024-06-16T09:51:55Z) - Resistive Memory-based Neural Differential Equation Solver for Score-based Diffusion Model [55.116403765330084]
Current AIGC methods, such as score-based diffusion, are still deficient in terms of rapidity and efficiency.
We propose a time-continuous and analog in-memory neural differential equation solver for score-based diffusion.
We experimentally validate our solution with 180 nm resistive memory in-memory computing macros.
arXiv Detail & Related papers (2024-04-08T16:34:35Z) - Electronic excited states from physically-constrained machine learning [0.0]
We present an integrated modeling approach, in which a symmetry-adapted ML model of an effective Hamiltonian is trained to reproduce electronic excitations from a quantum-mechanical calculation.
The resulting model can make predictions for molecules that are much larger and more complex than those that it is trained on.
arXiv Detail & Related papers (2023-11-01T20:49:59Z) - Transformers as Statisticians: Provable In-Context Learning with
In-Context Algorithm Selection [88.23337313766353]
This work first provides a comprehensive statistical theory for transformers to perform ICL.
We show that transformers can implement a broad class of standard machine learning algorithms in context.
A emphsingle transformer can adaptively select different base ICL algorithms.
arXiv Detail & Related papers (2023-06-07T17:59:31Z) - Theory and Implementation of Process and Temperature Scalable
Shape-based CMOS Analog Circuits [6.548257506132353]
This work proposes a novel analog computing framework for designing an analog ML processor similar to that of a digital design.
At the core of our work lies shape-based analog computing (S-AC)
S-AC paradigm also allows the user to trade off computational precision with silicon circuit area and power.
arXiv Detail & Related papers (2022-05-11T17:46:01Z) - Memristive Stochastic Computing for Deep Learning Parameter Optimization [1.6344851071810071]
Computing (SC) is a computing paradigm that allows for the low-cost and low-power of various arithmetic operations using bit streams and digital logic.
We demonstrate that in using a 40-nm Complementary Metal Oxide Semiconductor (CMOS) process our scalable architecture occupies 1.55mm$2$ and consumes approximately 167$mu$W when optimizing parameters of a Convolutional Neural Network (CNN) while it is being trained for a character recognition task, observing no notable reduction in accuracy post-training.
arXiv Detail & Related papers (2021-03-11T07:10:32Z) - Efficient Learning of Generative Models via Finite-Difference Score
Matching [111.55998083406134]
We present a generic strategy to efficiently approximate any-order directional derivative with finite difference.
Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations.
arXiv Detail & Related papers (2020-07-07T10:05:01Z) - Predictive Coding Approximates Backprop along Arbitrary Computation
Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents.
Our models perform equivalently to backprop on challenging machine learning benchmarks.
Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv Detail & Related papers (2020-06-07T15:35:47Z) - One-step regression and classification with crosspoint resistive memory
arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge.
One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition.
Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z) - Automatic Differentiation in ROOT [62.997667081978825]
In mathematics and computer algebra, automatic differentiation (AD) is a set of techniques to evaluate the derivative of a function specified by a computer program.
This paper presents AD techniques available in ROOT, supported by Cling, to produce derivatives of arbitrary C/C++ functions.
arXiv Detail & Related papers (2020-04-09T09:18:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.