Multi-Grade Deep Learning
- URL: http://arxiv.org/abs/2302.00150v1
- Date: Wed, 1 Feb 2023 00:09:56 GMT
- Title: Multi-Grade Deep Learning
- Authors: Yuesheng Xu
- Abstract summary: Current deep learning model is of a single-grade neural network.
We propose a multi-grade learning model that enables us to learn deep neural network much more effectively and efficiently.
- Score: 3.0069322256338906
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The current deep learning model is of a single-grade, that is, it learns a
deep neural network by solving a single nonconvex optimization problem. When
the layer number of the neural network is large, it is computationally
challenging to carry out such a task efficiently. Inspired by the human
education process which arranges learning in grades, we propose a multi-grade
learning model: We successively solve a number of optimization problems of
small sizes, which are organized in grades, to learn a shallow neural network
for each grade. Specifically, the current grade is to learn the leftover from
the previous grade. In each of the grades, we learn a shallow neural network
stacked on the top of the neural network, learned in the previous grades, which
remains unchanged in training of the current and future grades. By dividing the
task of learning a deep neural network into learning several shallow neural
networks, one can alleviate the severity of the nonconvexity of the original
optimization problem of a large size. When all grades of the learning are
completed, the final neural network learned is a stair-shape neural network,
which is the superposition of networks learned from all grades. Such a model
enables us to learn a deep neural network much more effectively and
efficiently. Moreover, multi-grade learning naturally leads to adaptive
learning. We prove that in the context of function approximation if the neural
network generated by a new grade is nontrivial, the optimal error of the grade
is strictly reduced from the optimal error of the previous grade. Furthermore,
we provide several proof-of-concept numerical examples which demonstrate that
the proposed multi-grade model outperforms significantly the traditional
single-grade model and is much more robust than the traditional model.
Related papers
- Residual Random Neural Networks [0.0]
Single-layer feedforward neural network with random weights is a recurring motif in the neural networks literature.
We show that one can obtain good classification results even if the number of hidden neurons has the same order of magnitude as the dimensionality of the data samples.
arXiv Detail & Related papers (2024-10-25T22:00:11Z) - Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - Message Passing Variational Autoregressive Network for Solving Intractable Ising Models [6.261096199903392]
Many deep neural networks have been used to solve Ising models, including autoregressive neural networks, convolutional neural networks, recurrent neural networks, and graph neural networks.
Here we propose a variational autoregressive architecture with a message passing mechanism, which can effectively utilize the interactions between spin variables.
The new network trained under an annealing framework outperforms existing methods in solving several prototypical Ising spin Hamiltonians, especially for larger spin systems at low temperatures.
arXiv Detail & Related papers (2024-04-09T11:27:07Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Excess Risk of Two-Layer ReLU Neural Networks in Teacher-Student
Settings and its Superiority to Kernel Methods [58.44819696433327]
We investigate the risk of two-layer ReLU neural networks in a teacher regression model.
We find that the student network provably outperforms any solution methods.
arXiv Detail & Related papers (2022-05-30T02:51:36Z) - Stochastic Neural Networks with Infinite Width are Deterministic [7.07065078444922]
We study neural networks, a main type of neural network in use.
We prove that as the width of an optimized neural network tends to infinity, its predictive variance on the training set decreases to zero.
arXiv Detail & Related papers (2022-01-30T04:52:31Z) - Dynamic Neural Diversification: Path to Computationally Sustainable
Neural Networks [68.8204255655161]
Small neural networks with a constrained number of trainable parameters, can be suitable resource-efficient candidates for many simple tasks.
We explore the diversity of the neurons within the hidden layer during the learning process.
We analyze how the diversity of the neurons affects predictions of the model.
arXiv Detail & Related papers (2021-09-20T15:12:16Z) - Incremental Deep Neural Network Learning using Classification Confidence
Thresholding [4.061135251278187]
Most modern neural networks for classification fail to take into account the concept of the unknown.
This paper proposes the Classification Confidence Threshold approach to prime neural networks for incremental learning.
arXiv Detail & Related papers (2021-06-21T22:46:28Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - The large learning rate phase of deep learning: the catapult mechanism [50.23041928811575]
We present a class of neural networks with solvable training dynamics.
We find good agreement between our model's predictions and training dynamics in realistic deep learning settings.
We believe our results shed light on characteristics of models trained at different learning rates.
arXiv Detail & Related papers (2020-03-04T17:52:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.