Analog In-memory Training on General Non-ideal Resistive Elements: The Impact of Response Functions
- URL: http://arxiv.org/abs/2502.06309v3
- Date: Fri, 19 Sep 2025 05:20:53 GMT
- Title: Analog In-memory Training on General Non-ideal Resistive Elements: The Impact of Response Functions
- Authors: Zhaoxian Wu, Quan Xiao, Tayfun Gokmen, Omobayode Fagbohungbe, Tianyi Chen,
- Abstract summary: In AIMC hardware, the trainable weights are represented by the conductance of resistive elements and updated using consecutive electrical pulses.<n>While the conductance changes by a constant in response to each pulse, in reality, the change is scaled by asymmetric and non-linear textitresponse functions, leading to a non-ideal training dynamic.<n>This paper provides a theoretical foundation for gradient-based training on AIMC hardware with non-ideal response functions.
- Score: 46.75046795995564
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the economic and environmental costs of training and deploying large vision or language models increase dramatically, analog in-memory computing (AIMC) emerges as a promising energy-efficient solution. However, the training perspective, especially its training dynamic, is underexplored. In AIMC hardware, the trainable weights are represented by the conductance of resistive elements and updated using consecutive electrical pulses. While the conductance changes by a constant in response to each pulse, in reality, the change is scaled by asymmetric and non-linear \textit{response functions}, leading to a non-ideal training dynamic. This paper provides a theoretical foundation for gradient-based training on AIMC hardware with non-ideal response functions. We demonstrate that asymmetric response functions negatively impact Analog SGD by imposing an implicit penalty on the objective. To overcome the issue, we propose Residual Learning algorithm, which provably converges exactly to a critical point by solving a bilevel optimization problem. We demonstrate that the proposed method can be extended to address other hardware imperfections, such as limited response granularity. As we know, it is the first paper to investigate the impact of a class of generic non-ideal response functions. The conclusion is supported by simulations validating our theoretical insights.
Related papers
- Conditional Denoising Model as a Physical Surrogate Model [1.0616273526777913]
We introduce a generative model designed to learn the geometry of the physical manifold itself.<n>By training the network to restore clean states from noisy ones, the model learns a vector field that points continuously towards the valid solution subspace.
arXiv Detail & Related papers (2026-01-28T20:32:20Z) - Softly Induced Functional Simplicity: Implications for Neural Network Generalisation, Robustness, and Distillation [0.0]
Learning robust and generalisable abstractions from high-dimensional input data is a central challenge in machine learning.<n>We show that a soft symmetry respecting inductive bias creates approximate degeneracies in the loss, which we identify as pseudo-Goldstone modes.<n>Our results demonstrate that solutions of lower complexity give rise to abstractions that are more generalisable, robust, and efficiently distillable.
arXiv Detail & Related papers (2026-01-10T14:44:46Z) - Efficient Inference for Inverse Reinforcement Learning and Dynamic Discrete Choice Models [35.877107409163784]
Inverse reinforcement learning (IRL) and dynamic discrete choice (DDC) models explain sequential decision-making by recovering reward functions that rationalize observed behavior.<n>We develop a semiparametric framework for debiased inverse reinforcement learning that yields statistically efficient inference for a broad class of reward-dependent functionals.
arXiv Detail & Related papers (2025-12-30T18:41:05Z) - From Physics to Machine Learning and Back: Part II - Learning and Observational Bias in PHM [52.64097278841485]
Review examines how incorporating learning and observational biases through physics-informed modeling and data strategies can guide models toward physically consistent and reliable predictions.<n>Fast adaptation methods including meta-learning and few-shot learning are reviewed alongside domain generalization techniques.
arXiv Detail & Related papers (2025-09-25T14:15:43Z) - Q-function Decomposition with Intervention Semantics with Factored Action Spaces [51.01244229483353]
We consider Q-functions defined over a lower dimensional projected subspace of the original action space, and study the condition for the unbiasedness of decomposed Q-functions.<n>This leads to a general scheme which we call action decomposed reinforcement learning that uses the projected Q-functions to approximate the Q-function in standard model-free reinforcement learning algorithms.
arXiv Detail & Related papers (2025-04-30T05:26:51Z) - Dynamics and Computational Principles of Echo State Networks: A Mathematical Perspective [13.135043580306224]
Reservoir computing (RC) represents a class of state-space models (SSMs) characterized by a fixed state transition mechanism (the reservoir) and a flexible readout layer that maps from the state space.
This work presents a systematic exploration of RC, addressing its foundational properties such as the echo state property, fading memory, and reservoir capacity through the lens of dynamical systems theory.
We formalize the interplay between input signals and reservoir states, demonstrating the conditions under which reservoirs exhibit stability and expressive power.
arXiv Detail & Related papers (2025-04-16T04:28:05Z) - Efficient dynamic modal load reconstruction using physics-informed Gaussian processes based on frequency-sparse Fourier basis functions [0.0]
This paper presents an efficient dynamic load reconstruction method using physics-informed Gaussian processes (GP)
The GP's covariance matrices are built using the description of the system dynamics, and the model is trained using structural response measurements.
The developed model holds potential for applications in structural health monitoring, damage prognosis, and load model validation.
arXiv Detail & Related papers (2025-03-12T14:16:27Z) - Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining [55.262510814326035]
Existing reweighting strategies primarily focus on group-level data importance.
We introduce novel algorithms for dynamic, instance-level data reweighting.
Our framework allows us to devise reweighting strategies deprioritizing redundant or uninformative data.
arXiv Detail & Related papers (2025-02-10T17:57:15Z) - ECM: A Unified Electronic Circuit Model for Explaining the Emergence of In-Context Learning and Chain-of-Thought in Large Language Model [64.22300168242221]
In-Context Learning (ICL) and Chain-of-Thought (CoT) are emerging capabilities in large language models.
We propose the Electronic Circuit Model (ECM) to better understand ICL and CoT.
We show that ECM effectively predicts and explains LLM performance across a variety of prompting strategies.
arXiv Detail & Related papers (2025-02-05T16:22:33Z) - GausSim: Foreseeing Reality by Gaussian Simulator for Elastic Objects [55.02281855589641]
GausSim is a novel neural network-based simulator designed to capture the dynamic behaviors of real-world elastic objects represented through Gaussian kernels.
We leverage continuum mechanics and treat each kernel as a Center of Mass System (CMS) that represents continuous piece of matter.
In addition, GausSim incorporates explicit physics constraints, such as mass and momentum conservation, ensuring interpretable results and robust, physically plausible simulations.
arXiv Detail & Related papers (2024-12-23T18:58:17Z) - Physics-Informed Weakly Supervised Learning for Interatomic Potentials [17.165117198519248]
We introduce a physics-informed, weakly supervised approach for training machine-learned interatomic potentials.
We demonstrate reduced energy and force errors -- often lower by a factor of two -- for various baseline models and benchmark data sets.
arXiv Detail & Related papers (2024-07-23T12:49:04Z) - Towards Exact Gradient-based Training on Analog In-memory Computing [28.38387901763604]
Inference on analog accelerators has been studied recently, but the training perspective is underexplored.
Recent studies have shown that the "workhorse" of digital AI training - gradient descent (SGD) algorithm converges inexactly when applied to model training on non-ideal devices.
This paper puts forth a theoretical foundation for gradient-based training on analog devices.
arXiv Detail & Related papers (2024-06-18T16:43:59Z) - Discovering High-Strength Alloys via Physics-Transfer Learning [1.2438700252649395]
Peierls stress measures material strength by evaluating dislocation resistance to plastic flow.<n>We propose a data-driven framework that leverages neural networks trained on force field simulations to understand crystal plasticity physics.
arXiv Detail & Related papers (2024-03-12T11:05:05Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Physics Informed Recurrent Neural Networks for Seismic Response
Evaluation of Nonlinear Systems [0.0]
This paper proposes a novel approach for evaluating the dynamic response of multi-degree-of-freedom (MDOF) systems.
The focus of this paper is to evaluate the seismic (earthquake) response of nonlinear structures.
The predicted response will be compared to state-of-the-art methods such as FEA to assess the efficacy of the physics-informed RNN model.
arXiv Detail & Related papers (2023-08-16T20:06:41Z) - End-to-End Meta-Bayesian Optimisation with Transformer Neural Processes [52.818579746354665]
This paper proposes the first end-to-end differentiable meta-BO framework that generalises neural processes to learn acquisition functions via transformer architectures.
We enable this end-to-end framework with reinforcement learning (RL) to tackle the lack of labelled acquisition data.
arXiv Detail & Related papers (2023-05-25T10:58:46Z) - Physics-aware deep learning framework for linear elasticity [0.0]
The paper presents an efficient and robust data-driven deep learning (DL) computational framework for linear continuum elasticity problems.
For an accurate representation of the field variables, a multi-objective loss function is proposed.
Several benchmark problems including the Airimaty solution to elasticity and the Kirchhoff-Love plate problem are solved.
arXiv Detail & Related papers (2023-02-19T20:33:32Z) - NN-EUCLID: deep-learning hyperelasticity without stress data [0.0]
We propose a new approach for unsupervised learning of hyperelastic laws with physics-consistent deep neural networks.
In contrast to supervised learning, which assumes the stress-strain, the approach only uses realistically measurable full-elastic field displacement and global force availability data.
arXiv Detail & Related papers (2022-05-04T13:54:54Z) - Inverse-Dirichlet Weighting Enables Reliable Training of Physics
Informed Neural Networks [2.580765958706854]
We describe and remedy a failure mode that may arise from multi-scale dynamics with scale imbalances during training of deep neural networks.
PINNs are popular machine-learning templates that allow for seamless integration of physical equation models with data.
For inverse modeling using sequential training, we find that inverse-Dirichlet weighting protects a PINN against catastrophic forgetting.
arXiv Detail & Related papers (2021-07-02T10:01:37Z) - Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment.
One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting.
But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient.
We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z) - Training End-to-End Analog Neural Networks with Equilibrium Propagation [64.0476282000118]
We introduce a principled method to train end-to-end analog neural networks by gradient descent.
We show mathematically that a class of analog neural networks (called nonlinear resistive networks) are energy-based models.
Our work can guide the development of a new generation of ultra-fast, compact and low-power neural networks supporting on-chip learning.
arXiv Detail & Related papers (2020-06-02T23:38:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.