Deep Kronecker neural networks: A general framework for neural networks
with adaptive activation functions
- URL: http://arxiv.org/abs/2105.09513v1
- Date: Thu, 20 May 2021 04:54:57 GMT
- Title: Deep Kronecker neural networks: A general framework for neural networks
with adaptive activation functions
- Authors: Ameya D. Jagtap, Yeonjong Shin, Kenji Kawaguchi, George Em Karniadakis
- Abstract summary: We propose a new type of neural networks, Kronecker neural networks (KNNs), that form a general framework for neural networks with adaptive activation functions.
Under suitable conditions, KNNs induce a faster decay of the loss than that by the feed-forward networks.
- Score: 4.932130498861987
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a new type of neural networks, Kronecker neural networks (KNNs),
that form a general framework for neural networks with adaptive activation
functions. KNNs employ the Kronecker product, which provides an efficient way
of constructing a very wide network while keeping the number of parameters low.
Our theoretical analysis reveals that under suitable conditions, KNNs induce a
faster decay of the loss than that by the feed-forward networks. This is also
empirically verified through a set of computational examples. Furthermore,
under certain technical assumptions, we establish global convergence of
gradient descent for KNNs. As a specific case, we propose the Rowdy activation
function that is designed to get rid of any saturation region by injecting
sinusoidal fluctuations, which include trainable parameters. The proposed Rowdy
activation function can be employed in any neural network architecture like
feed-forward neural networks, Recurrent neural networks, Convolutional neural
networks etc. The effectiveness of KNNs with Rowdy activation is demonstrated
through various computational experiments including function approximation
using feed-forward neural networks, solution inference of partial differential
equations using the physics-informed neural networks, and standard deep
learning benchmark problems using convolutional and fully-connected neural
networks.
Related papers
- Novel Kernel Models and Exact Representor Theory for Neural Networks Beyond the Over-Parameterized Regime [52.00917519626559]
This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology.
We also present an exact novel representor theory for layer-wise neural network training with unregularized gradient descent in terms of a local-extrinsic neural kernel (LeNK)
This representor theory gives insight into the role of higher-order statistics in neural network training and the effect of kernel evolution in neural-network kernel models.
arXiv Detail & Related papers (2024-05-24T06:30:36Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - A Hybrid Neural Coding Approach for Pattern Recognition with Spiking
Neural Networks [53.31941519245432]
Brain-inspired spiking neural networks (SNNs) have demonstrated promising capabilities in solving pattern recognition tasks.
These SNNs are grounded on homogeneous neurons that utilize a uniform neural coding for information representation.
In this study, we argue that SNN architectures should be holistically designed to incorporate heterogeneous coding schemes.
arXiv Detail & Related papers (2023-05-26T02:52:12Z) - Optimal rates of approximation by shallow ReLU$^k$ neural networks and
applications to nonparametric regression [12.21422686958087]
We study the approximation capacity of some variation spaces corresponding to shallow ReLU$k$ neural networks.
For functions with less smoothness, the approximation rates in terms of the variation norm are established.
We show that shallow neural networks can achieve the minimax optimal rates for learning H"older functions.
arXiv Detail & Related papers (2023-04-04T06:35:02Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a
Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp)
In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks.
We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z) - Consistency of Neural Networks with Regularization [0.0]
This paper proposes the general framework of neural networks with regularization and prove its consistency.
Two types of activation functions: hyperbolic function(Tanh) and rectified linear unit(ReLU) have been taken into consideration.
arXiv Detail & Related papers (2022-06-22T23:33:39Z) - Parameter Convex Neural Networks [13.42851919291587]
We propose the exponential multilayer neural network (EMLP) which is convex with regard to the parameters of the neural network under some conditions.
For late experiments, we use the same architecture to make the exponential graph convolutional network (EGCN) and do the experiment on the graph classificaion dataset.
arXiv Detail & Related papers (2022-06-11T16:44:59Z) - Optimal Learning Rates of Deep Convolutional Neural Networks: Additive
Ridge Functions [19.762318115851617]
We consider the mean squared error analysis for deep convolutional neural networks.
We show that, for additive ridge functions, convolutional neural networks followed by one fully connected layer with ReLU activation functions can reach optimal mini-max rates.
arXiv Detail & Related papers (2022-02-24T14:22:32Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Effective and Efficient Computation with Multiple-timescale Spiking
Recurrent Neural Networks [0.9790524827475205]
We show how a novel type of adaptive spiking recurrent neural network (SRNN) is able to achieve state-of-the-art performance.
We calculate a $>$100x energy improvement for our SRNNs over classical RNNs on the harder tasks.
arXiv Detail & Related papers (2020-05-24T01:04:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.