Training a Two Layer ReLU Network Analytically
- URL: http://arxiv.org/abs/2304.02972v1
- Date: Thu, 6 Apr 2023 09:57:52 GMT
- Title: Training a Two Layer ReLU Network Analytically
- Authors: Adrian Barbu
- Abstract summary: We will explore an algorithm for training two-layer neural networks with ReLU-like activation and the square loss.
The method is faster than the gradient descent methods and has virtually no tuning parameters.
- Score: 4.94950858749529
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural networks are usually trained with different variants of gradient
descent based optimization algorithms such as stochastic gradient descent or
the Adam optimizer. Recent theoretical work states that the critical points
(where the gradient of the loss is zero) of two-layer ReLU networks with the
square loss are not all local minima. However, in this work we will explore an
algorithm for training two-layer neural networks with ReLU-like activation and
the square loss that alternatively finds the critical points of the loss
function analytically for one layer while keeping the other layer and the
neuron activation pattern fixed. Experiments indicate that this simple
algorithm can find deeper optima than Stochastic Gradient Descent or the Adam
optimizer, obtaining significantly smaller training loss values on four out of
the five real datasets evaluated. Moreover, the method is faster than the
gradient descent methods and has virtually no tuning parameters.
Related papers
- Implicit Bias of Gradient Descent for Two-layer ReLU and Leaky ReLU
Networks on Nearly-orthogonal Data [66.1211659120882]
The implicit bias towards solutions with favorable properties is believed to be a key reason why neural networks trained by gradient-based optimization can generalize well.
While the implicit bias of gradient flow has been widely studied for homogeneous neural networks (including ReLU and leaky ReLU networks), the implicit bias of gradient descent is currently only understood for smooth neural networks.
arXiv Detail & Related papers (2023-10-29T08:47:48Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations.
For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two.
For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
arXiv Detail & Related papers (2022-10-13T15:09:54Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Revisiting Recursive Least Squares for Training Deep Neural Networks [10.44340837533087]
Recursive least squares (RLS) algorithms were once widely used for training small-scale neural networks, due to their fast convergence.
Previous RLS algorithms are unsuitable for training deep neural networks (DNNs), since they have high computational complexity and too many preconditions.
We propose three novel RLS optimization algorithms for training feedforward neural networks, convolutional neural networks and recurrent neural networks.
arXiv Detail & Related papers (2021-09-07T17:43:51Z) - The layer-wise L1 Loss Landscape of Neural Nets is more complex around
local minima [3.04585143845864]
We use the Deep ReLU Simplex algorithm to minimize the loss monotonically on adjacent vertices.
In a neighbourhood around a local minimum, the iterations behave differently such that conclusions on loss level and proximity of the local minimum can be made before it has been found.
This could have far-reaching consequences for the design of new gradient-descent algorithms.
arXiv Detail & Related papers (2021-05-06T17:18:44Z) - Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow.
We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z) - The duality structure gradient descent algorithm: analysis and applications to neural networks [0.0]
We propose an algorithm named duality structure gradient descent (DSGD) that is amenable to non-asymptotic performance analysis.
We empirically demonstrate the behavior of DSGD in several neural network training scenarios.
arXiv Detail & Related papers (2017-08-01T21:24:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.