Universality of Gradient Descent Neural Network Training
- URL: http://arxiv.org/abs/2007.13664v1
- Date: Mon, 27 Jul 2020 16:17:19 GMT
- Title: Universality of Gradient Descent Neural Network Training
- Authors: G. Welper
- Abstract summary: We discuss the question if it is always possible to redesign a neural network so that it trains well with gradient descent.
The construction is not intended for practical computations, but it provides some orientation on the possibilities of meta-learning and related approaches.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It has been observed that design choices of neural networks are often crucial
for their successful optimization. In this article, we therefore discuss the
question if it is always possible to redesign a neural network so that it
trains well with gradient descent. This yields the following universality
result: If, for a given network, there is any algorithm that can find good
network weights for a classification task, then there exists an extension of
this network that reproduces these weights and the corresponding forward output
by mere gradient descent training. The construction is not intended for
practical computations, but it provides some orientation on the possibilities
of meta-learning and related approaches.
Related papers
- Gradient-Free Neural Network Training on the Edge [12.472204825917629]
Training neural networks is computationally heavy and energy-intensive.
This work presents a novel technique for training neural networks without needing gradient.
We show that it is possible to train models without gradient-based optimization techniques by identifying erroneous contributions of each neuron towards the expected classification.
arXiv Detail & Related papers (2024-10-13T05:38:39Z) - Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Theoretical Characterization of How Neural Network Pruning Affects its
Generalization [131.1347309639727]
This work makes the first attempt to study how different pruning fractions affect the model's gradient descent dynamics and generalization.
It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero.
More surprisingly, the generalization bound gets better as the pruning fraction gets larger.
arXiv Detail & Related papers (2023-01-01T03:10:45Z) - Combining Optimal Path Search With Task-Dependent Learning in a Neural
Network [4.1712273169097305]
We show that one can define a neural network representation of path finding problems by transforming cost values into synaptic weights.
We show that network learning mechanisms can adapt the weights in the network augmenting the resulting paths according to some task at hand.
arXiv Detail & Related papers (2022-01-26T18:29:00Z) - Neural Capacitance: A New Perspective of Neural Network Selection via
Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction.
We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training.
Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z) - Training Graph Neural Networks by Graphon Estimation [2.5997274006052544]
We propose to train a graph neural network via resampling from a graphon estimate obtained from the underlying network data.
We show that our approach is competitive with and in many cases outperform the other over-smoothing reducing GNN training methods.
arXiv Detail & Related papers (2021-09-04T19:21:48Z) - Fast Adaptation with Linearized Neural Networks [35.43406281230279]
We study the inductive biases of linearizations of neural networks, which we show to be surprisingly good summaries of the full network functions.
Inspired by this finding, we propose a technique for embedding these inductive biases into Gaussian processes through a kernel designed from the Jacobian of the network.
In this setting, domain adaptation takes the form of interpretable posterior inference, with accompanying uncertainty estimation.
arXiv Detail & Related papers (2021-03-02T03:23:03Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z) - Backprojection for Training Feedforward Neural Networks in the Input and
Feature Spaces [12.323996999894002]
We propose a new algorithm for training feedforward neural networks which is fairly faster than backpropagation.
The proposed algorithm can be used for both input and feature spaces, named as backprojection and kernel backprojection, respectively.
arXiv Detail & Related papers (2020-04-05T20:53:11Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.