A Kaczmarz-inspired approach to accelerate the optimization of neural network wavefunctions
- URL: http://arxiv.org/abs/2401.10190v2
- Date: Fri, 23 Aug 2024 21:22:39 GMT
- Title: A Kaczmarz-inspired approach to accelerate the optimization of neural network wavefunctions
- Authors: Gil Goldshlager, Nilin Abrahamsen, Lin Lin,
- Abstract summary: We propose the Subsampled Projected Gradient-Increment Natural Descent (SPRING) to reduce this bottleneck.
SPRING combines ideas from the recently introduced minimum-step reconfiguration (MinSR) and the classical randomized Kaczmarz method for solving linear least-squares problems.
We demonstrate that SPRING outperforms both MinSR and the popular Kronecker-Factored Approximate Curvature method (KFAC) across a number of small atoms and molecules.
- Score: 0.7438129207086058
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural network wavefunctions optimized using the variational Monte Carlo method have been shown to produce highly accurate results for the electronic structure of atoms and small molecules, but the high cost of optimizing such wavefunctions prevents their application to larger systems. We propose the Subsampled Projected-Increment Natural Gradient Descent (SPRING) optimizer to reduce this bottleneck. SPRING combines ideas from the recently introduced minimum-step stochastic reconfiguration optimizer (MinSR) and the classical randomized Kaczmarz method for solving linear least-squares problems. We demonstrate that SPRING outperforms both MinSR and the popular Kronecker-Factored Approximate Curvature method (KFAC) across a number of small atoms and molecules, given that the learning rates of all methods are optimally tuned. For example, on the oxygen atom, SPRING attains chemical accuracy after forty thousand training iterations, whereas both MinSR and KFAC fail to do so even after one hundred thousand iterations.
Related papers
- Sample-efficient Bayesian Optimisation Using Known Invariances [56.34916328814857]
We show that vanilla and constrained BO algorithms are inefficient when optimising invariant objectives.
We derive a bound on the maximum information gain of these invariant kernels.
We use our method to design a current drive system for a nuclear fusion reactor, finding a high-performance solution.
arXiv Detail & Related papers (2024-10-22T12:51:46Z) - Improved Optimization for the Neural-network Quantum States and Tests on the Chromium Dimer [11.985673663540688]
Neural-network Quantum States (NQS) has significantly advanced wave function ansatz research.
This work introduces three algorithmic enhancements to reduce computational demands of VMC optimization using NQS.
arXiv Detail & Related papers (2024-04-14T15:07:57Z) - Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML.
This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z) - Faster Stochastic Variance Reduction Methods for Compositional MiniMax
Optimization [50.10952609321302]
compositional minimax optimization is a pivotal challenge across various machine learning domains.
Current methods of compositional minimax optimization are plagued by sub-optimal complexities or heavy reliance on sizable batch sizes.
This paper introduces a novel method, called Nested STOchastic Recursive Momentum (NSTORM), which can achieve the optimal sample complexity of $O(kappa3 /epsilon3 )$.
arXiv Detail & Related papers (2023-08-18T14:57:21Z) - FAVOR#: Sharp Attention Kernel Approximations via New Classes of
Positive Random Features [39.282051468586666]
We propose parameterized, positive, non-trigonometric RFs which approximate Gaussian and softmax- Kernels.
We show that our methods lead to variance reduction in practice and outperform previous methods in a kernel regression task.
We also present FAVOR#, a method for self-attention approximation in Transformers.
arXiv Detail & Related papers (2023-02-01T22:43:29Z) - Efficient Approximations of the Fisher Matrix in Neural Networks using
Kronecker Product Singular Value Decomposition [0.0]
It is shown that natural gradient descent can minimize the objective function more efficiently than ordinary gradient descent based methods.
The bottleneck of this approach for training deep neural networks lies in the prohibitive cost of solving a large dense linear system corresponding to the Fisher Information Matrix (FIM) at each iteration.
This has motivated various approximations of either the exact FIM or the empirical one.
The most sophisticated of these is KFAC, which involves a Kronecker-factored block diagonal approximation of the FIM.
With only a slight additional cost, a few improvements of KFAC from the standpoint of accuracy are proposed
arXiv Detail & Related papers (2022-01-25T12:56:17Z) - A deep learning-based model reduction (DeePMR) method for simplifying
chemical kinetics [10.438320849775224]
The DeePMR is proposed and validated using high-temperature auto-ignitions, perfectly stirred reactors (PSR), and one-dimensional freely propagating flames of n-heptane/air mixtures.
The key idea of the DeePMR is to employ a deep neural network (DNN) to formulate the objective function in the optimization problem.
arXiv Detail & Related papers (2022-01-06T12:31:32Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - Rayleigh-Gauss-Newton optimization with enhanced sampling for
variational Monte Carlo [0.0]
We analyze optimization and sampling methods used in Variational Monte Carlo.
We introduce alterations to improve their performance.
In particular, we demonstrate that RGN can be made robust to energy spikes.
arXiv Detail & Related papers (2021-06-19T19:05:52Z) - Global Optimization of Gaussian processes [52.77024349608834]
We propose a reduced-space formulation with trained Gaussian processes trained on few data points.
The approach also leads to significantly smaller and computationally cheaper sub solver for lower bounding.
In total, we reduce time convergence by orders of orders of the proposed method.
arXiv Detail & Related papers (2020-05-21T20:59:11Z) - Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient
Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients.
We prove new complexity that outperform state-of-the-art results in this case.
We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.