Related papers: A Structure-Guided Gauss-Newton Method for Shallow ReLU Neural Network

A Structure-Guided Gauss-Newton Method for Shallow ReLU Neural Network

URL: http://arxiv.org/abs/2404.05064v1
Date: Sun, 7 Apr 2024 20:24:44 GMT
Title: A Structure-Guided Gauss-Newton Method for Shallow ReLU Neural Network
Authors: Zhiqiang Cai, Tong Ding, Min Liu, Xinyu Liu, Jianlin Xia,
Abstract summary: We propose a structure-guided Gauss-Newton (SgGN) method for solving least squares problems using a shallow ReLU neural network. The method effectively takes advantage of both the least squares structure and the neural network structure of the objective function.
Score: 18.06366638807982
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose a structure-guided Gauss-Newton (SgGN) method for solving least squares problems using a shallow ReLU neural network. The method effectively takes advantage of both the least squares structure and the neural network structure of the objective function. By categorizing the weights and biases of the hidden and output layers of the network as nonlinear and linear parameters, respectively, the method iterates back and forth between the nonlinear and linear parameters. The nonlinear parameters are updated by a damped Gauss-Newton method and the linear ones are updated by a linear solver. Moreover, at the Gauss-Newton step, a special form of the Gauss-Newton matrix is derived for the shallow ReLU neural network and is used for efficient iterations. It is shown that the corresponding mass and Gauss-Newton matrices in the respective linear and nonlinear steps are symmetric and positive definite under reasonable assumptions. Thus, the SgGN method naturally produces an effective search direction without the need of additional techniques like shifting in the Levenberg-Marquardt method to achieve invertibility of the Gauss-Newton matrix. The convergence and accuracy of the method are demonstrated numerically for several challenging function approximation problems, especially those with discontinuities or sharp transition layers that pose significant challenges for commonly used training algorithms in machine learning.

Related papers

Physics-informed neural networks for high-dimensional solutions and snaking bifurcations in nonlinear lattices [0.0]
This paper introduces a framework based on physics-informed neural networks (PINNs) for addressing key challenges in nonlinear lattices.<n>We first employ PINNs to approximate solutions of nonlinear systems arising from lattice models, using the Levenberg-Marquardt algorithm.<n>We then extend the method by coupling PINNs with a continuation approach to compute snaking bifurcation diagrams.<n>For linear stability analysis, we adapt PINNs to compute eigenvectors, introducing output constraints to enforce positivity, in line with Sturm-Liouville theory.
arXiv Detail & Related papers (2025-07-13T20:41:55Z)
Accelerating Natural Gradient Descent for PINNs with Randomized Numerical Linear Algebra [0.0]
Natural Gradient Descent (NGD) has emerged as a promising optimization algorithm for training neural network-based solvers for partial differential equations (PDEs)<n>We extend matrix-free NGD to broader classes of problems than previously considered and propose the use of Randomized Nystr"om preconditioning to accelerate convergence of the inner CG solver.<n>The resulting algorithm demonstrates substantial performance improvements over existing NGD-based methods on a range of PDE problems discretized using neural networks.
arXiv Detail & Related papers (2025-05-16T19:00:40Z)
Gauss-Newton Dynamics for Neural Networks: A Riemannian Optimization Perspective [3.48097307252416]
We analyze the convergence of Gauss-Newton dynamics for training neural networks with smooth activation functions. We show that the Levenberg-Marquardt dynamics with an appropriately chosen damping factor yields robustness to ill-conditioned kernels.
arXiv Detail & Related papers (2024-12-18T16:51:47Z)
Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks [3.680127959836384]
implicit gradient descent (IGD) outperforms the common gradient descent (GD) in handling certain multi-scale problems. We show that IGD converges a globally optimal solution at a linear convergence rate.
arXiv Detail & Related papers (2024-07-03T06:10:41Z)
Precise asymptotics of reweighted least-squares algorithms for linear diagonal networks [15.074950361970194]
We provide a unified analysis for a family of algorithms that encompasses IRLS, the recently proposed linlin-RFM algorithm, and the alternating diagonal neural networks. We show that, with appropriately chosen reweighting policy, a handful of sparse structures can achieve favorable performance. We also show that leveraging this in the reweighting scheme provably improves test error compared to coordinate-wise reweighting.
arXiv Detail & Related papers (2024-06-04T20:37:17Z)
Matrix Completion via Nonsmooth Regularization of Fully Connected Neural Networks [7.349727826230864]
It has been shown that enhanced performance could be attained by using nonlinear estimators such as deep neural networks. In this paper, we control over-fitting by regularizing FCNN model in terms of norm intermediate representations. Our simulations indicate the superiority of the proposed algorithm in comparison with existing linear and nonlinear algorithms.
arXiv Detail & Related papers (2024-03-15T12:00:37Z)
Constrained Optimization via Exact Augmented Lagrangian and Randomized Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems. We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z)
A Randomised Subspace Gauss-Newton Method for Nonlinear Least-Squares [0.6445605125467572]
We propose a Randomised Subspace Gauss-Newton (R-SGN) algorithm for solving nonlinear least-squares optimization problems. A sublinear global rate of convergence result is presented for a trust-region variant of R-SGN, with high probability.
arXiv Detail & Related papers (2022-11-10T17:51:08Z)
NeuralEF: Deconstructing Kernels by Deep Neural Networks [47.54733625351363]
Traditional nonparametric solutions based on the Nystr"om formula suffer from scalability issues. Recent work has resorted to a parametric approach, i.e., training neural networks to approximate the eigenfunctions. We show that these problems can be fixed by using a new series of objective functions that generalizes to space of supervised and unsupervised learning problems.
arXiv Detail & Related papers (2022-04-30T05:31:07Z)
Inverse Problem of Nonlinear Schr\"odinger Equation as Learning of Convolutional Neural Network [5.676923179244324]
It is shown that one can obtain a relatively accurate estimate of the considered parameters using the proposed method. It provides a natural framework in inverse problems of partial differential equations with deep learning.
arXiv Detail & Related papers (2021-07-19T02:54:37Z)
Cogradient Descent for Dependable Learning [64.02052988844301]
We propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem. CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint. It can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-06-20T04:28:20Z)
Disentangling the Gauss-Newton Method and Approximate Inference for Neural Networks [96.87076679064499]
We disentangle the generalized Gauss-Newton and approximate inference for Bayesian deep learning. We find that the Gauss-Newton method simplifies the underlying probabilistic model significantly. The connection to Gaussian processes enables new function-space inference algorithms.
arXiv Detail & Related papers (2020-07-21T17:42:58Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
Multipole Graph Neural Operator for Parametric Partial Differential Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data. We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity. Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.