Related papers: Estimation of Energy-dissipation Lower-bounds for Neuromorphic Learning-in-memory

Estimation of Energy-dissipation Lower-bounds for Neuromorphic Learning-in-memory

URL: http://arxiv.org/abs/2402.14878v3
Date: Tue, 19 Aug 2025 20:34:48 GMT
Title: Estimation of Energy-dissipation Lower-bounds for Neuromorphic Learning-in-memory
Authors: Zihao Chen, Faiek Ahsan, Johannes Leugering, Gert Cauwenberghs, Shantanu Chakrabartty,
Abstract summary: An ideal neuromorphic neurally-inspired neurally-equilibriums rely on local but parallel parameter updates to solve problems that range from quadratic programming to Ising machines.<n>An analysis presented in this paper captures the out-of- thermodynamics of learning and the resulting energy-efficiency estimates are model-agnostic.<n>To show practical applicability of our results, we apply our analysis for estimating the lower-bound on the energy-to-solution metrics for large-scale AI workloads.
Score: 5.073292775065559
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Neuromorphic or neurally-inspired optimizers rely on local but parallel parameter updates to solve problems that range from quadratic programming to Ising machines. An ideal realization of such an optimizer not only uses a compute-in-memory (CIM) paradigm to address the so-called memory-wall (i.e. energy dissipated due to repeated memory read access), but also uses a learning-in-memory (LIM) paradigm to address the energy bottlenecks due to repeated memory writes at the precision required for optimization (the update-wall), and to address the energy bottleneck due to the repeated transfer of information between short-term and long-term memories (the consolidation-wall). In this paper, we derive theoretical estimates for the energy-to-solution metric that can be achieved by this ideal neuromorphic optimizer which is realized by modulating the energy-barrier of the physical memories such that the dynamics of memory updates and memory consolidation matches the optimization or the annealing dynamics. The analysis presented in this paper captures the out-of-equilibrium thermodynamics of learning and the resulting energy-efficiency estimates are model-agnostic which only depend on the number of model-update operations (OPS), the model-size in terms of number of parameters, the speed of convergence, and the precision of the solution. To show the practical applicability of our results, we apply our analysis for estimating the lower-bound on the energy-to-solution metrics for large-scale AI workloads.

Related papers

POET-X: Memory-efficient LLM Training by Scaling Orthogonal Transformation [57.57816409869894]
We introduce POET-X, a scalable and memory-efficient variant for training large language models.<n>PoET-X maintains the generalization and stability benefits of POET while achieving substantial improvements in throughput and memory efficiency.
arXiv Detail & Related papers (2026-03-05T18:59:23Z)
ECO: Quantized Training without Full-Precision Master Weights [58.97082407934466]
Error-Compensating (ECO) eliminates master weights by applying updates directly to quantized parameters.<n>We show that ECO converges to a constant-radius neighborhood of the optimum, while naive master-weight removal can incur an error that is inversely proportional to the learning rate.
arXiv Detail & Related papers (2026-01-29T18:35:01Z)
Language Modeling With Factorization Memory [1.9538130634206368]
We propose Factorization Memory, an efficient recurrent neural network (RNN) architecture that achieves performance comparable to Transformer models on short-context language modeling tasks.<n>We develop a sparse formulation of Factorization Memory that updates only a subset of recurrent states at each step while preserving the strong performance of its dense counterpart.
arXiv Detail & Related papers (2025-10-31T23:27:11Z)
Quantifying Memory Utilization with Effective State-Size [73.52115209375343]
We develop a measure of textitmemory utilization'<n>This metric is tailored to the fundamental class of systems with textitinput-invariant and textitinput-varying linear operators
arXiv Detail & Related papers (2025-04-28T08:12:30Z)
Memory Is Not the Bottleneck: Cost-Efficient Continual Learning via Weight Space Consolidation [55.77835198580209]
Continual learning (CL) has traditionally emphasized minimizing exemplar memory usage, assuming that memory is the primary bottleneck.<n>This paper re-examines CL under a more realistic setting with sufficient exemplar memory, where the system can retain a representative portion of past data.<n>We find that, under this regime, stability improves due to reduced forgetting, but plasticity diminishes as the model becomes biased toward prior tasks and struggles to adapt to new ones.
arXiv Detail & Related papers (2025-02-11T05:40:52Z)
Tensor-GaLore: Memory-Efficient Training via Gradient Tensor Decomposition [93.98343072306619]
We present Navier-GaLore, a novel method for efficient training of neural networks with higher-order tensor weights. Across various PDE tasks, Navier-GaLore achieves substantial memory savings, reducing memory usage by up to 75%.
arXiv Detail & Related papers (2025-01-04T20:51:51Z)
Harnessing Nonidealities in Analog In-Memory Computing Circuits: A Physical Modeling Approach for Neuromorphic Systems [5.582327246405357]
In-memory computing (IMC) offers a promising solution by addressing the von Neumann bottleneck inherent in traditional deep learning accelerators. This paper presents a novel approach to directly train physical models of IMC, formulated as ordinary-differential-equation (ODE)-based physical neural networks (PNNs) To enable the training of large-scale networks, we propose a technique called differentiable spike-time discretization (DSTD), which reduces the computational cost of ODE-based PNNs by up to 20 times in speed and 100 times in memory.
arXiv Detail & Related papers (2024-12-12T07:22:23Z)
Mathematical Formalism for Memory Compression in Selective State Space Models [0.0]
State space models (SSMs) have emerged as a powerful framework for modelling long-range dependencies in sequence data. We develop a rigorous mathematical framework for understanding memory compression in selective state space models. We show that selective SSMs offer significant improvements in memory efficiency and processing speed compared to traditional RNN-based models.
arXiv Detail & Related papers (2024-10-04T05:45:48Z)
Fine-Grained Gradient Restriction: A Simple Approach for Mitigating Catastrophic Forgetting [41.891312602770746]
Gradient Episodic Memory (GEM) achieves balance by utilizing a subset of past training samples to restrict the update direction of the model parameters. We show that memory strength is effective mainly because it improves GEM's ability generalization and therefore leads to a more favorable trade-off.
arXiv Detail & Related papers (2024-10-01T17:03:56Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
Memory-Efficient Optimization with Factorized Hamiltonian Descent [11.01832755213396]
We introduce a novel adaptive, H-Fac, which incorporates a memory-efficient factorization approach to address this challenge. By employing a rank-1 parameterization for both momentum and scaling parameter estimators, H-Fac reduces memory costs to a sublinear level. We develop our algorithms based on principles derived from Hamiltonian dynamics, providing robust theoretical underpinnings in optimization dynamics and convergence guarantees.
arXiv Detail & Related papers (2024-06-14T12:05:17Z)
Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference [2.9302211589186244]
Large language models (LLMs) have transformed natural language processing, enabling machines to generate human-like text and engage in meaningful conversations. Developments in computing and memory capabilities are lagging behind, exacerbated by the discontinuation of Moore's law. compute-in-memory (CIM) technologies offer a promising solution for accelerating AI inference by directly performing analog computations in memory.
arXiv Detail & Related papers (2024-06-12T16:57:58Z)
Memory-efficient Energy-adaptive Inference of Pre-Trained Models on Batteryless Embedded Systems [0.0]
Batteryless systems often face power failures, requiring extra runtime buffers to maintain progress and leaving only a memory space for storing ultra-tiny deep neural networks (DNNs) We propose FreeML, a framework to optimize pre-trained DNN models for memory-efficient and energy-adaptive inference on batteryless systems. Our experiments showed that FreeML reduces the model sizes by up to $95 times$, supports adaptive inference with a $2.03-19.65 times$ less memory overhead, and provides significant time and energy benefits with only a negligible accuracy drop compared to the state-of-the-art
arXiv Detail & Related papers (2024-05-16T20:16:45Z)
Resistive Memory-based Neural Differential Equation Solver for Score-based Diffusion Model [55.116403765330084]
Current AIGC methods, such as score-based diffusion, are still deficient in terms of rapidity and efficiency. We propose a time-continuous and analog in-memory neural differential equation solver for score-based diffusion. We experimentally validate our solution with 180 nm resistive memory in-memory computing macros.
arXiv Detail & Related papers (2024-04-08T16:34:35Z)
Full-Stack Optimization for CAM-Only DNN Inference [2.0837295518447934]
This paper explores the combination of algorithmic optimizations for ternary weight neural networks and associative processors. We propose a novel compilation flow to optimize convolutions on APs by reducing their arithmetic intensity. Our solution improves the energy efficiency of ResNet-18 inference on ImageNet by 7.5x compared to crossbar in-memory accelerators.
arXiv Detail & Related papers (2024-01-23T10:27:38Z)
Random resistive memory-based deep extreme point learning machine for unified visual processing [67.51600474104171]
We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM) Our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems.
arXiv Detail & Related papers (2023-12-14T09:46:16Z)
StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization [12.707050104493218]
We prove that state-space models without any re parameterization exhibit a memory limitation similar to that of traditional RNNs. Our analysis identifies this "curse of memory" as a result of the recurrent weights converging to a stability boundary.
arXiv Detail & Related papers (2023-11-24T14:08:31Z)
EMO: Episodic Memory Optimization for Few-Shot Meta-Learning [69.50380510879697]
episodic memory optimization for meta-learning, we call EMO, is inspired by the human ability to recall past learning experiences from the brain's memory. EMO nudges parameter updates in the right direction, even when the gradients provided by a limited number of examples are uninformative. EMO scales well with most few-shot classification benchmarks and improves the performance of optimization-based meta-learning methods.
arXiv Detail & Related papers (2023-06-08T13:39:08Z)
Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks. We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z)
An Adaptive Synaptic Array using Fowler-Nordheim Dynamic Analog Memory [6.681943980068049]
We present a synaptic array that uses dynamical states to implement an analog memory for energy-efficient training of machine learning (ML) systems. With the energy-dissipation as low as 5 fJ per memory update and a programming resolution up to 14 bits, the proposed synapse array could be used to address the energy-efficiency imbalance between the training and the inference phases.
arXiv Detail & Related papers (2021-04-13T04:08:04Z)
SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and Training [82.35376405568975]
Deep neural networks (DNNs) come with heavy parameterization, leading to external dynamic random-access memory (DRAM) for storage. We present SmartDeal (SD), an algorithm framework to trade higher-cost memory storage/access for lower-cost computation. We show that SD leads to 10.56x and 4.48x reduction in the storage and training energy, with negligible accuracy loss compared to state-of-the-art training baselines.
arXiv Detail & Related papers (2021-01-04T18:54:07Z)
Adaptive pruning-based optimization of parameterized quantum circuits [62.997667081978825]
Variisy hybrid quantum-classical algorithms are powerful tools to maximize the use of Noisy Intermediate Scale Quantum devices. We propose a strategy for such ansatze used in variational quantum algorithms, which we call "Efficient Circuit Training" (PECT) Instead of optimizing all of the ansatz parameters at once, PECT launches a sequence of variational algorithms.
arXiv Detail & Related papers (2020-10-01T18:14:11Z)
Targeted free energy estimation via learned mappings [66.20146549150475]
Free energy perturbation (FEP) was proposed by Zwanzig more than six decades ago as a method to estimate free energy differences. FEP suffers from a severe limitation: the requirement of sufficient overlap between distributions. One strategy to mitigate this problem, called Targeted Free Energy Perturbation, uses a high-dimensional mapping in configuration space to increase overlap.
arXiv Detail & Related papers (2020-02-12T11:10:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.