Related papers: Self-Expanding Neural Networks

Self-Expanding Neural Networks

URL: http://arxiv.org/abs/2307.04526v3
Date: Fri, 9 Feb 2024 14:02:28 GMT
Title: Self-Expanding Neural Networks
Authors: Rupert Mitchell, Robin Menzenbach, Kristian Kersting, Martin Mundt
Abstract summary: We introduce a natural gradient based approach which intuitively expands both the width and depth of a neural network. We prove an upper bound on the rate'' at which neurons are added, and a computationally cheap lower bound on the expansion score. We illustrate the benefits of such Self-Expanding Neural Networks with full connectivity and convolutions in both classification and regression problems.
Score: 24.812671965904727
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The results of training a neural network are heavily dependent on the architecture chosen; and even a modification of only its size, however small, typically involves restarting the training process. In contrast to this, we begin training with a small architecture, only increase its capacity as necessary for the problem, and avoid interfering with previous optimization while doing so. We thereby introduce a natural gradient based approach which intuitively expands both the width and depth of a neural network when this is likely to substantially reduce the hypothetical converged training loss. We prove an upper bound on the ``rate'' at which neurons are added, and a computationally cheap lower bound on the expansion score. We illustrate the benefits of such Self-Expanding Neural Networks with full connectivity and convolutions in both classification and regression problems, including those where the appropriate architecture size is substantially uncertain a priori.

Related papers

When less is more: evolving large neural networks from small ones [0.0]
We study feed-forward neural networks that are small and dynamic, whose nodes can be added (or subtracted) during training. A single neuronal weight in the network controls the network's size, while the weight itself is optimized by the same gradient-descent algorithm. We train and evaluate such Nimble Neural Networks on nonlinear regression and classification tasks where they outperform the corresponding static networks.
arXiv Detail & Related papers (2025-01-29T21:56:38Z)
Extraction Propagation [4.368185344922342]
We develop a novel neural network architecture called Extraction propagation. Extraction propagation works by training, in parallel, many small neural networks which interact with one another.
arXiv Detail & Related papers (2024-02-24T19:06:41Z)
Sup-Norm Convergence of Deep Neural Network Estimator for Nonparametric Regression by Adversarial Training [5.68558935178946]
We show the sup-norm convergence of deep neural network estimators with a novel adversarial training scheme. A deep neural network estimator achieves the optimal rate in the sup-norm sense by the proposed adversarial training with correction.
arXiv Detail & Related papers (2023-07-08T20:24:14Z)
Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights. We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z)
Multi-Grade Deep Learning [3.0069322256338906]
Current deep learning model is of a single-grade neural network. We propose a multi-grade learning model that enables us to learn deep neural network much more effectively and efficiently.
arXiv Detail & Related papers (2023-02-01T00:09:56Z)
Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime. We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z)
Improving Deep Neural Network Random Initialization Through Neuronal Rewiring [14.484787903053208]
We show that a higher neuronal strength variance may decrease performance, while a lower neuronal strength variance usually improves it. A new method is then proposed to rewire neuronal connections according to a preferential attachment (PA) rule based on their strength. In this sense, PA only reorganizes connections, while preserving the magnitude and distribution of the weights.
arXiv Detail & Related papers (2022-07-17T11:52:52Z)
Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks. We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z)
Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis. By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner. This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)
Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs) In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit. We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z)
Distance-Based Regularisation of Deep Networks for Fine-Tuning [116.71288796019809]
We develop an algorithm that constrains a hypothesis class to a small sphere centred on the initial pre-trained weights. Empirical evaluation shows that our algorithm works well, corroborating our theoretical results.
arXiv Detail & Related papers (2020-02-19T16:00:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.