Implicitly Restarted Lanczos Enables Chemically-Accurate Shallow Neural Quantum States
- URL: http://arxiv.org/abs/2601.01437v1
- Date: Sun, 04 Jan 2026 08:49:24 GMT
- Title: Implicitly Restarted Lanczos Enables Chemically-Accurate Shallow Neural Quantum States
- Authors: Wei Liu, Wenjie Dou,
- Abstract summary: We introduce the implicitly restarted Lanczos (IRL) method as the core engine for neural quantum states (NQS) training.<n>Our key innovation is an inherently stable second-order optimization framework that recasts the ill-conditioned parameter update problem into a small, well-posed Hermitian eigenvalue problem.<n>We demonstrate that IRL enables shallow NQS to consistently achieve extreme precision (1e-12 kcal/mol) in just 3 to 5 optimization steps.<n>For the F2 molecule, this translates to an approximate 17,900-fold speed-up in total runtime compared to Adam.
- Score: 4.039934762896615
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The variational optimization of high-dimensional neural network models, such as those used in neural quantum states (NQS), presents a significant challenge in machine intelligence. Conventional first-order stochastic methods (e.g., Adam) are plagued by slow convergence, sensitivity to hyperparameters, and numerical instability, preventing NQS from reaching the high accuracy required for fundamental science. We address this fundamental optimization bottleneck by introducing the implicitly restarted Lanczos (IRL) method as the core engine for NQS training. Our key innovation is an inherently stable second-order optimization framework that recasts the ill-conditioned parameter update problem into a small, well-posed Hermitian eigenvalue problem. By solving this problem efficiently and robustly with IRL, our approach automatically determines the optimal descent direction and step size, circumventing the need for demanding hyperparameter tuning and eliminating the numerical instabilities common in standard iterative solvers. We demonstrate that IRL enables shallow NQS architectures (with orders of magnitude fewer parameters) to consistently achieve extreme precision (1e-12 kcal/mol) in just 3 to 5 optimization steps. For the F2 molecule, this translates to an approximate 17,900-fold speed-up in total runtime compared to Adam. This work establishes IRL as a superior, robust, and efficient second-order optimization strategy for variational quantum models, paving the way for the practical, high-fidelity application of neural networks in quantum physics and chemistry.
Related papers
- MAD-NG: Meta-Auto-Decoder Neural Galerkin Method for Solving Parametric Partial Differential Equations [5.767740428776141]
Parametric partial differential equations (PDEs) are fundamental for modeling a wide range of physical and engineering systems.<n>Traditional neural network-based solvers, such as Physics-Informed Neural Networks (PINNs) and Deep Galerkin Methods, often face challenges in generalization and long-time prediction efficiency.<n>We propose a novel and scalable framework that significantly enhances the Neural Galerkin Method (NGM) by incorporating the Meta-Auto-Decoder (MAD) paradigm.
arXiv Detail & Related papers (2025-12-25T11:27:40Z) - Structure and asymptotic preserving deep neural surrogates for uncertainty quantification in multiscale kinetic equations [5.181697052513637]
High dimensionality of kinetic equations with parameters poses computational challenges for uncertainty quantification (UQ)<n>Traditional Monte Carlo (MC) sampling methods suffer from slow convergence and high variance, which become increasingly severe as the dimensionality of the space grows.<n>We introduce surrogate models based on structure and preserving neural networks (SAPNNs)<n>SAPNNs are specifically designed to satisfy key physical properties, including positivity, conservation laws, entropy dissipation, parameter limits.
arXiv Detail & Related papers (2025-06-12T12:20:53Z) - Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics [64.62231094774211]
Statefuls (e.g., Adam) maintain auxiliary information even 2x the model size in order to achieve optimal convergence.<n>SOLO enables Adam-styles to maintain quantized states with precision as low as 3 bits, or even 2 bits.<n>SOLO can thus be seamlessly applied to Adam-styles, leading to substantial memory savings with minimal accuracy loss.
arXiv Detail & Related papers (2025-05-01T06:47:45Z) - Improved Optimization for the Neural-network Quantum States and Tests on the Chromium Dimer [11.985673663540688]
Neural-network Quantum States (NQS) has significantly advanced wave function ansatz research.
This work introduces three algorithmic enhancements to reduce computational demands of VMC optimization using NQS.
arXiv Detail & Related papers (2024-04-14T15:07:57Z) - Scalable Imaginary Time Evolution with Neural Network Quantum States [0.0]
The representation of a quantum wave function as a neural network quantum state (NQS) provides a powerful variational ansatz for finding the ground states of many-body quantum systems.
We introduce an approach that bypasses the computation of the metric tensor and instead relies exclusively on first-order descent with Euclidean metric.
We make this method adaptive and stable by determining the optimal time step and keeping the target fixed until the energy of the NQS decreases.
arXiv Detail & Related papers (2023-07-28T12:26:43Z) - A Deep Unrolling Model with Hybrid Optimization Structure for Hyperspectral Image Deconvolution [50.13564338607482]
We propose a novel optimization framework for the hyperspectral deconvolution problem, called DeepMix.<n>It consists of three distinct modules, namely, a data consistency module, a module that enforces the effect of the handcrafted regularizers, and a denoising module.<n>This work proposes a context aware denoising module designed to sustain the advancements achieved by the cooperative efforts of the other modules.
arXiv Detail & Related papers (2023-06-10T08:25:16Z) - NeuralStagger: Accelerating Physics-constrained Neural PDE Solver with
Spatial-temporal Decomposition [67.46012350241969]
This paper proposes a general acceleration methodology called NeuralStagger.
It decomposing the original learning tasks into several coarser-resolution subtasks.
We demonstrate the successful application of NeuralStagger on 2D and 3D fluid dynamics simulations.
arXiv Detail & Related papers (2023-02-20T19:36:52Z) - Towards Theoretically Inspired Neural Initialization Optimization [66.04735385415427]
We propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network.
We show that both the training and test performance of a network can be improved by maximizing GradCosine under norm constraint.
Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost.
arXiv Detail & Related papers (2022-10-12T06:49:16Z) - On Fast Simulation of Dynamical System with Neural Vector Enhanced
Numerical Solver [59.13397937903832]
We introduce a deep learning-based corrector called Neural Vector (NeurVec)
NeurVec can compensate for integration errors and enable larger time step sizes in simulations.
Our experiments on a variety of complex dynamical system benchmarks demonstrate that NeurVec exhibits remarkable generalization capability.
arXiv Detail & Related papers (2022-08-07T09:02:18Z) - Learning to Fit Morphable Models [12.469605679847085]
We build upon recent advances in learned optimization and propose an update rule inspired by the classic Levenberg-Marquardt algorithm.
We show the effectiveness of the proposed neural on the problems of 3D body surface estimation from a head-mounted device and face fitting from 2D landmarks.
arXiv Detail & Related papers (2021-11-29T18:59:53Z) - Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer.
We show that there is a natural synergy between these two settings.
We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z) - Optimizing Large-Scale Hyperparameters via Automated Learning Algorithm [97.66038345864095]
We propose a new hyperparameter optimization method with zeroth-order hyper-gradients (HOZOG)
Specifically, we first formulate hyperparameter optimization as an A-based constrained optimization problem.
Then, we use the average zeroth-order hyper-gradients to update hyper parameters.
arXiv Detail & Related papers (2021-02-17T21:03:05Z) - GradInit: Learning to Initialize Neural Networks for Stable and
Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks.
It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value.
It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.