Related papers: Deep Maxout Network Gaussian Process

Deep Maxout Network Gaussian Process

URL: http://arxiv.org/abs/2208.04468v1
Date: Mon, 8 Aug 2022 23:52:26 GMT
Title: Deep Maxout Network Gaussian Process
Authors: Libin Liang, Ye Tian and Ge Cheng
Abstract summary: We derive the equivalence of the deep, infinite-width maxout network and the Gaussian process (GP) We build up the connection between our deep maxout network kernel and deep neural network kernels.
Score: 1.9292807030801753
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Study of neural networks with infinite width is important for better understanding of the neural network in practical application. In this work, we derive the equivalence of the deep, infinite-width maxout network and the Gaussian process (GP) and characterize the maxout kernel with a compositional structure. Moreover, we build up the connection between our deep maxout network kernel and deep neural network kernels. We also give an efficient numerical implementation of our kernel which can be adapted to any maxout rank. Numerical results show that doing Bayesian inference based on the deep maxout network kernel can lead to competitive results compared with their finite-width counterparts and deep neural network kernels. This enlightens us that the maxout activation may also be incorporated into other infinite-width neural network structures such as the convolutional neural network (CNN).

Related papers

Sparsity-depth Tradeoff in Infinitely Wide Deep Neural Networks [22.083873334272027]
We observe that sparser networks outperform the non-sparse networks at shallow depths on a variety of datasets. We extend the existing theory on the generalization error of kernel-ridge regression.
arXiv Detail & Related papers (2023-05-17T20:09:35Z)
Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights. We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z)
Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime. We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z)
Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp) In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks. We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z)
Why Quantization Improves Generalization: NTK of Binary Weight Neural Networks [33.08636537654596]
We take the binary weights in a neural network as random variables under rounding, and study the distribution propagation over different layers in the neural network. We propose a quasi neural network to approximate the distribution propagation, which is a neural network with continuous parameters and smooth activation function.
arXiv Detail & Related papers (2022-06-13T06:11:21Z)
Incorporating Prior Knowledge into Neural Networks through an Implicit Composite Kernel [1.6383321867266318]
Implicit Composite Kernel (ICK) is a kernel that combines a kernel implicitly defined by a neural network with a second kernel function chosen to model known properties. We demonstrate ICK's superior performance and flexibility on both synthetic and real-world data sets.
arXiv Detail & Related papers (2022-05-15T21:32:44Z)
On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK) In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z)
Neural Network Gaussian Processes by Increasing Depth [0.6091702876917281]
We show that increasing the depth of a neural network can give rise to a Gaussian process. We also theoretically characterize its uniform tightness property and the smallest eigenvalue of its associated kernel. These characterizations can not only enhance our understanding of the proposed depth-induced Gaussian processes, but also pave the way for future applications.
arXiv Detail & Related papers (2021-08-29T15:37:26Z)
Learning Structures for Deep Neural Networks [99.8331363309895]
We propose to adopt the efficient coding principle, rooted in information theory and developed in computational neuroscience. We show that sparse coding can effectively maximize the entropy of the output signals. Our experiments on a public image classification dataset demonstrate that using the structure learned from scratch by our proposed algorithm, one can achieve a classification accuracy comparable to the best expert-designed structure.
arXiv Detail & Related papers (2021-05-27T12:27:24Z)
Analysis of Structured Deep Kernel Networks [0.0]
We show that the use of special types of kernels yields models reminiscent of neural networks founded in the same theoretical framework of classical kernel methods. Especially the introduced Structured Deep Kernel Networks (SDKNs) can be viewed as unbounded neural networks (NNs) with optimizable activation functions obeying a representer theorem.
arXiv Detail & Related papers (2021-05-15T14:10:35Z)
Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks. Centered and ensembled finite networks have reduced posterior variance. Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.