A Structure-Guided Gauss-Newton Method for Shallow ReLU Neural Network
- URL: http://arxiv.org/abs/2404.05064v1
- Date: Sun, 7 Apr 2024 20:24:44 GMT
- Title: A Structure-Guided Gauss-Newton Method for Shallow ReLU Neural Network
- Authors: Zhiqiang Cai, Tong Ding, Min Liu, Xinyu Liu, Jianlin Xia,
- Abstract summary: We propose a structure-guided Gauss-Newton (SgGN) method for solving least squares problems using a shallow ReLU neural network.
The method effectively takes advantage of both the least squares structure and the neural network structure of the objective function.
- Score: 18.06366638807982
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a structure-guided Gauss-Newton (SgGN) method for solving least squares problems using a shallow ReLU neural network. The method effectively takes advantage of both the least squares structure and the neural network structure of the objective function. By categorizing the weights and biases of the hidden and output layers of the network as nonlinear and linear parameters, respectively, the method iterates back and forth between the nonlinear and linear parameters. The nonlinear parameters are updated by a damped Gauss-Newton method and the linear ones are updated by a linear solver. Moreover, at the Gauss-Newton step, a special form of the Gauss-Newton matrix is derived for the shallow ReLU neural network and is used for efficient iterations. It is shown that the corresponding mass and Gauss-Newton matrices in the respective linear and nonlinear steps are symmetric and positive definite under reasonable assumptions. Thus, the SgGN method naturally produces an effective search direction without the need of additional techniques like shifting in the Levenberg-Marquardt method to achieve invertibility of the Gauss-Newton matrix. The convergence and accuracy of the method are demonstrated numerically for several challenging function approximation problems, especially those with discontinuities or sharp transition layers that pose significant challenges for commonly used training algorithms in machine learning.
Related papers
- Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks [3.680127959836384]
implicit gradient descent (IGD) outperforms the common gradient descent (GD) in handling certain multi-scale problems.
We show that IGD converges a globally optimal solution at a linear convergence rate.
arXiv Detail & Related papers (2024-07-03T06:10:41Z) - Precise asymptotics of reweighted least-squares algorithms for linear diagonal networks [15.074950361970194]
We provide a unified analysis for a family of algorithms that encompasses IRLS, the recently proposed linlin-RFM algorithm, and the alternating diagonal neural networks.
We show that, with appropriately chosen reweighting policy, a handful of sparse structures can achieve favorable performance.
We also show that leveraging this in the reweighting scheme provably improves test error compared to coordinate-wise reweighting.
arXiv Detail & Related papers (2024-06-04T20:37:17Z) - Matrix Completion via Nonsmooth Regularization of Fully Connected Neural Networks [7.349727826230864]
It has been shown that enhanced performance could be attained by using nonlinear estimators such as deep neural networks.
In this paper, we control over-fitting by regularizing FCNN model in terms of norm intermediate representations.
Our simulations indicate the superiority of the proposed algorithm in comparison with existing linear and nonlinear algorithms.
arXiv Detail & Related papers (2024-03-15T12:00:37Z) - Constrained Optimization via Exact Augmented Lagrangian and Randomized
Iterative Sketching [55.28394191394675]
We develop an adaptive inexact Newton method for equality-constrained nonlinear, nonIBS optimization problems.
We demonstrate the superior performance of our method on benchmark nonlinear problems, constrained logistic regression with data from LVM, and a PDE-constrained problem.
arXiv Detail & Related papers (2023-05-28T06:33:37Z) - A Randomised Subspace Gauss-Newton Method for Nonlinear Least-Squares [0.6445605125467572]
We propose a Randomised Subspace Gauss-Newton (R-SGN) algorithm for solving nonlinear least-squares optimization problems.
A sublinear global rate of convergence result is presented for a trust-region variant of R-SGN, with high probability.
arXiv Detail & Related papers (2022-11-10T17:51:08Z) - NeuralEF: Deconstructing Kernels by Deep Neural Networks [47.54733625351363]
Traditional nonparametric solutions based on the Nystr"om formula suffer from scalability issues.
Recent work has resorted to a parametric approach, i.e., training neural networks to approximate the eigenfunctions.
We show that these problems can be fixed by using a new series of objective functions that generalizes to space of supervised and unsupervised learning problems.
arXiv Detail & Related papers (2022-04-30T05:31:07Z) - Inverse Problem of Nonlinear Schr\"odinger Equation as Learning of
Convolutional Neural Network [5.676923179244324]
It is shown that one can obtain a relatively accurate estimate of the considered parameters using the proposed method.
It provides a natural framework in inverse problems of partial differential equations with deep learning.
arXiv Detail & Related papers (2021-07-19T02:54:37Z) - Cogradient Descent for Dependable Learning [64.02052988844301]
We propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem.
CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint.
It can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-06-20T04:28:20Z) - Disentangling the Gauss-Newton Method and Approximate Inference for
Neural Networks [96.87076679064499]
We disentangle the generalized Gauss-Newton and approximate inference for Bayesian deep learning.
We find that the Gauss-Newton method simplifies the underlying probabilistic model significantly.
The connection to Gaussian processes enables new function-space inference algorithms.
arXiv Detail & Related papers (2020-07-21T17:42:58Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.