Related papers: In-memory Training on Analog Devices with Limited Conductance States via Multi-tile Residual Learning

In-memory Training on Analog Devices with Limited Conductance States via Multi-tile Residual Learning

URL: http://arxiv.org/abs/2510.02516v1
Date: Thu, 02 Oct 2025 19:44:25 GMT
Title: In-memory Training on Analog Devices with Limited Conductance States via Multi-tile Residual Learning
Authors: Jindan Li, Zhaoxian Wu, Gaowen Liu, Tayfun Gokmen, Tianyi Chen,
Abstract summary: In-memory training typically requires at least 8-bit conductance states to match digital baselines.<n>Many promising memristive devices such as ReRAM offer only about 4-bit resolution due to fabrication constraints.<n>This paper proposes a emphresidual learning framework that sequentially learns on multiple crossbar tiles to compensate the residual errors.
Score: 59.091567092071564
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Analog in-memory computing (AIMC) accelerators enable efficient deep neural network computation directly within memory using resistive crossbar arrays, where model parameters are represented by the conductance states of memristive devices. However, effective in-memory training typically requires at least 8-bit conductance states to match digital baselines. Realizing such fine-grained states is costly and often requires complex noise mitigation techniques that increase circuit complexity and energy consumption. In practice, many promising memristive devices such as ReRAM offer only about 4-bit resolution due to fabrication constraints, and this limited update precision substantially degrades training accuracy. To enable on-chip training with these limited-state devices, this paper proposes a \emph{residual learning} framework that sequentially learns on multiple crossbar tiles to compensate the residual errors from low-precision weight updates. Our theoretical analysis shows that the optimality gap shrinks with the number of tiles and achieves a linear convergence rate. Experiments on standard image classification benchmarks demonstrate that our method consistently outperforms state-of-the-art in-memory analog training strategies under limited-state settings, while incurring only moderate hardware overhead as confirmed by our cost analysis.

Related papers

TNT: Improving Chunkwise Training for Test-Time Memorization [62.78875147721906]
Recurrent neural networks (RNNs) with deep test-time memorization modules, such as Titans and TTT, represent a promising, linearly-scaling paradigm distinct from Transformers.<n>We introduce TNT, a novel training paradigm that decouples training efficiency from inference performance through a two-stage process.<n>TNT achieves a substantial acceleration in training speed-up to 17 times faster than the most accurate baseline configuration.
arXiv Detail & Related papers (2025-11-10T17:45:09Z)
Reduced Order Modeling with Shallow Recurrent Decoder Networks [5.686433280542813]
SHRED-ROM is a robust decoding-only strategy that encodes the numerically unstable approximation of an inverse.<n>We show that SHRED-ROM accurately reconstructs the state dynamics for new parameter values starting from limited fixed or mobile sensors.
arXiv Detail & Related papers (2025-02-15T23:41:31Z)
Forget Forgetting: Continual Learning in a World of Abundant Memory [55.64184779530581]
Continual learning has traditionally focused on minimizing exemplar memory.<n>This paper challenges this paradigm by investigating a more realistic regime.<n>We find that the core challenge shifts from stability to plasticity, as models become biased toward prior tasks and struggle to learn new ones.
arXiv Detail & Related papers (2025-02-11T05:40:52Z)
Towards Continual Learning Desiderata via HSIC-Bottleneck Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion. Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z)
Hardware-aware Training Techniques for Improving Robustness of Ex-Situ Neural Network Transfer onto Passive TiO2 ReRAM Crossbars [0.8553766625486795]
Training approaches that adapt techniques such as dropout, the reparametrization trick and regularization to TiO2 crossbar variabilities are proposed. For the neural network trained using the proposed hardware-aware method, 79.5% of the test set's data points can be classified with an accuracy of 95% or higher.
arXiv Detail & Related papers (2023-05-29T13:55:02Z)
Examining the Role and Limits of Batchnorm Optimization to Mitigate Diverse Hardware-noise in In-memory Computing [3.9488615467284225]
In-Memory Computing (IMC) platforms such as analog crossbars are gaining focus as they facilitate the acceleration of low-precision Deep Neural Networks (DNNs) with high area- & compute-efficiencies. The intrinsic non-idealities in crossbars, which are often non-deterministic and non-linear, degrade the performance of the deployed DNNs. This work aims to examine the distortions caused by these non-idealities on the dot-product operations in analog crossbars.
arXiv Detail & Related papers (2023-05-28T19:07:25Z)
Aggregating Capacity in FL through Successive Layer Training for Computationally-Constrained Devices [3.4530027457862]
Federated learning (FL) is usually performed on resource-constrained edge devices. FL training process should be adjusted to such constraints. We propose a new method that enables successive freezing and training of the parameters of the FL model at devices.
arXiv Detail & Related papers (2023-05-26T15:04:06Z)
Incremental Online Learning Algorithms Comparison for Gesture and Visual Smart Sensors [68.8204255655161]
This paper compares four state-of-the-art algorithms in two real applications: gesture recognition based on accelerometer data and image classification. Our results confirm these systems' reliability and the feasibility of deploying them in tiny-memory MCUs.
arXiv Detail & Related papers (2022-09-01T17:05:20Z)
On-Device Training Under 256KB Memory [62.95579393237751]
We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory. Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB and 1MB Flash.
arXiv Detail & Related papers (2022-06-30T17:59:08Z)
MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge [72.16021611888165]
This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for accurate and fast execution on edge devices. The proposed MEST framework consists of enhancements by Elastic Mutation (EM) and Soft Memory Bound (&S) Our results suggest that unforgettable examples can be identified in-situ even during the dynamic exploration of sparsity masks.
arXiv Detail & Related papers (2021-10-26T21:15:17Z)
Analog Neural Computing with Super-resolution Memristor Crossbars [0.0]
Memristor crossbar arrays are used in a wide range of in-memory and neuromorphic computing applications. This paper presents a technique to improve the resolution by building a super-resolution memristor crossbar with nodes having multiple memristors. The wider the range and number of conductance values, the higher the crossbar's resolution.
arXiv Detail & Related papers (2021-05-10T18:52:44Z)
FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training [62.932299614630985]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.<n>FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.