Related papers: First Demonstration of Second-order Training of Deep Neural Networks with In-memory Analog Matrix Computing

First Demonstration of Second-order Training of Deep Neural Networks with In-memory Analog Matrix Computing

URL: http://arxiv.org/abs/2512.05342v1
Date: Fri, 05 Dec 2025 00:52:46 GMT
Title: First Demonstration of Second-order Training of Deep Neural Networks with In-memory Analog Matrix Computing
Authors: Saitao Zhang, Yubiao Luo, Shiqing Wang, Pushen Zuo, Yongxiang Li, Lunshuai Pan, Zheng Miao, Zhong Sun,
Abstract summary: We present the first demonstration of a second-order powered by in-memory analog matrix computing (AMC)<n>We validate the inversion by training a two-layer convolutional neural network (CNN) for handwritten letter classification.<n>Our system delivers a 5.88x improvement in throughput and a 6.9x gain in energy efficiency compared to state-of-the-art digital processors.
Score: 7.51466944063829
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Second-order optimization methods, which leverage curvature information, offer faster and more stable convergence than first-order methods such as stochastic gradient descent (SGD) and Adam. However, their practical adoption is hindered by the prohibitively high cost of inverting the second-order information matrix, particularly in large-scale neural network training. Here, we present the first demonstration of a second-order optimizer powered by in-memory analog matrix computing (AMC) using resistive random-access memory (RRAM), which performs matrix inversion (INV) in a single step. We validate the optimizer by training a two-layer convolutional neural network (CNN) for handwritten letter classification, achieving 26% and 61% fewer training epochs than SGD with momentum and Adam, respectively. On a larger task using the same second-order method, our system delivers a 5.88x improvement in throughput and a 6.9x gain in energy efficiency compared to state-of-the-art digital processors. These results demonstrate the feasibility and effectiveness of AMC circuits for second-order neural network training, opening a new path toward energy-efficient AI acceleration.

Related papers

Towards Practical Second-Order Optimizers in Deep Learning: Insights from Fisher Information Analysis [0.0]
We present AdaFisher, a novel adaptive second-order tuning for deep neural networks (DNNs)<n>AdaFisher aims to bridge the gap between the improved convergence and generalization of second-order methods and the computational efficiency needed for trainings.<n>We demonstrate that AdaFisher outperforms state-of-the-art approximations in both accuracy and convergence speed.
arXiv Detail & Related papers (2025-04-26T05:02:21Z)
MARS: Unleashing the Power of Variance Reduction for Training Large Models [56.67982828148859]
We propose a unified training framework for deep neural networks.<n>We introduce three instances of MARS that leverage preconditioned gradient optimization.<n>Results indicate that the implementation of MARS consistently outperforms Adam.
arXiv Detail & Related papers (2024-11-15T18:57:39Z)
AdaFisher: Adaptive Second Order Optimization via Fisher Information [22.851200800265914]
First-order optimization methods are currently the mainstream in training deep neural networks (DNNs).s like Adam incorporate limited curvature information by employing the matrix preconditioning of the gradient during the training.<n>Despite their widespread, second-order optimization algorithms exhibit superior convergence properties compared to their first-order counterparts e.g. Adam and SGD.<n>We present emphAdaFisher--an adaptive second-order that leverages a emphdiagonal block-Kronecker approximation of the Fisher information matrix for adaptive gradient preconditioning.
arXiv Detail & Related papers (2024-05-26T01:25:02Z)
Preconditioners for the Stochastic Training of Neural Fields [28.310850948423813]
We propose a theoretical framework for training neural fields with curvature-aware diagonal preconditioners, demonstrating their effectiveness across tasks such as image reconstruction, shape modeling, and Neural Radiance Fields (NeRF)
arXiv Detail & Related papers (2024-02-13T20:46:37Z)
Eva: A General Vectorized Approximation Framework for Second-order Optimization [16.647611352181574]
We present a memory- and time-efficient second-order algorithm named Eva with two novel techniques. We derive an efficient update formula without explicitly computing the inverse of using the Sherman-Morrison formula. Experiments show that Eva reduces the end-to-end training time up to 2.05x and 2.42x compared to first-order SGD and second-order algorithms.
arXiv Detail & Related papers (2023-08-04T03:51:38Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Neural Nets with a Newton Conjugate Gradient Method on Multiple GPUs [0.0]
Training deep neural networks consumes increasing computational resource shares in many compute centers. We introduce a novel second-order optimization method that requires the effect of the Hessian on a vector only. We compare the proposed second-order method with two state-of-the-arts on five representative neural network problems.
arXiv Detail & Related papers (2022-08-03T12:38:23Z)
GLEAM: Greedy Learning for Large-Scale Accelerated MRI Reconstruction [50.248694764703714]
Unrolled neural networks have recently achieved state-of-the-art accelerated MRI reconstruction. These networks unroll iterative optimization algorithms by alternating between physics-based consistency and neural-network based regularization. We propose Greedy LEarning for Accelerated MRI reconstruction, an efficient training strategy for high-dimensional imaging settings.
arXiv Detail & Related papers (2022-07-18T06:01:29Z)
FPGA-based AI Smart NICs for Scalable Distributed AI Training Systems [62.20308752994373]
We propose a new smart network interface card (NIC) for distributed AI training systems using field-programmable gate arrays (FPGAs) Our proposed FPGA-based AI smart NIC enhances overall training performance by 1.6x at 6 nodes, with an estimated 2.5x performance improvement at 32 nodes, compared to the baseline system using conventional NICs.
arXiv Detail & Related papers (2022-04-22T21:57:00Z)
Revisiting Recursive Least Squares for Training Deep Neural Networks [10.44340837533087]
Recursive least squares (RLS) algorithms were once widely used for training small-scale neural networks, due to their fast convergence. Previous RLS algorithms are unsuitable for training deep neural networks (DNNs), since they have high computational complexity and too many preconditions. We propose three novel RLS optimization algorithms for training feedforward neural networks, convolutional neural networks and recurrent neural networks.
arXiv Detail & Related papers (2021-09-07T17:43:51Z)
Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces. We train and validate our approach directly on the Intel NNP-I chip for inference. We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z)
Communication-Efficient Distributed Stochastic AUC Maximization with Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network. Our model requires a much less number of communication rounds and still a number of communication rounds in theory. Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
One-step regression and classification with crosspoint resistive memory arrays [62.997667081978825]
High speed, low energy computing machines are in demand to enable real-time artificial intelligence at the edge. One-step learning is supported by simulations of the prediction of the cost of a house in Boston and the training of a 2-layer neural network for MNIST digit recognition. Results are all obtained in one computational step, thanks to the physical, parallel, and analog computing within the crosspoint array.
arXiv Detail & Related papers (2020-05-05T08:00:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.