Self-Expanding Neural Networks
- URL: http://arxiv.org/abs/2307.04526v3
- Date: Fri, 9 Feb 2024 14:02:28 GMT
- Title: Self-Expanding Neural Networks
- Authors: Rupert Mitchell, Robin Menzenbach, Kristian Kersting, Martin Mundt
- Abstract summary: We introduce a natural gradient based approach which intuitively expands both the width and depth of a neural network.
We prove an upper bound on the rate'' at which neurons are added, and a computationally cheap lower bound on the expansion score.
We illustrate the benefits of such Self-Expanding Neural Networks with full connectivity and convolutions in both classification and regression problems.
- Score: 24.812671965904727
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The results of training a neural network are heavily dependent on the
architecture chosen; and even a modification of only its size, however small,
typically involves restarting the training process. In contrast to this, we
begin training with a small architecture, only increase its capacity as
necessary for the problem, and avoid interfering with previous optimization
while doing so. We thereby introduce a natural gradient based approach which
intuitively expands both the width and depth of a neural network when this is
likely to substantially reduce the hypothetical converged training loss. We
prove an upper bound on the ``rate'' at which neurons are added, and a
computationally cheap lower bound on the expansion score. We illustrate the
benefits of such Self-Expanding Neural Networks with full connectivity and
convolutions in both classification and regression problems, including those
where the appropriate architecture size is substantially uncertain a priori.
Related papers
- Extraction Propagation [4.368185344922342]
We develop a novel neural network architecture called Extraction propagation.
Extraction propagation works by training, in parallel, many small neural networks which interact with one another.
arXiv Detail & Related papers (2024-02-24T19:06:41Z) - Sup-Norm Convergence of Deep Neural Network Estimator for Nonparametric
Regression by Adversarial Training [5.68558935178946]
We show the sup-norm convergence of deep neural network estimators with a novel adversarial training scheme.
A deep neural network estimator achieves the optimal rate in the sup-norm sense by the proposed adversarial training with correction.
arXiv Detail & Related papers (2023-07-08T20:24:14Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Multi-Grade Deep Learning [3.0069322256338906]
Current deep learning model is of a single-grade neural network.
We propose a multi-grade learning model that enables us to learn deep neural network much more effectively and efficiently.
arXiv Detail & Related papers (2023-02-01T00:09:56Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Improving Deep Neural Network Random Initialization Through Neuronal
Rewiring [14.484787903053208]
We show that a higher neuronal strength variance may decrease performance, while a lower neuronal strength variance usually improves it.
A new method is then proposed to rewire neuronal connections according to a preferential attachment (PA) rule based on their strength.
In this sense, PA only reorganizes connections, while preserving the magnitude and distribution of the weights.
arXiv Detail & Related papers (2022-07-17T11:52:52Z) - Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks.
We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Distance-Based Regularisation of Deep Networks for Fine-Tuning [116.71288796019809]
We develop an algorithm that constrains a hypothesis class to a small sphere centred on the initial pre-trained weights.
Empirical evaluation shows that our algorithm works well, corroborating our theoretical results.
arXiv Detail & Related papers (2020-02-19T16:00:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.