Why Quantization Improves Generalization: NTK of Binary Weight Neural
Networks
- URL: http://arxiv.org/abs/2206.05916v1
- Date: Mon, 13 Jun 2022 06:11:21 GMT
- Title: Why Quantization Improves Generalization: NTK of Binary Weight Neural
Networks
- Authors: Kaiqi Zhang, Ming Yin, Yu-Xiang Wang
- Abstract summary: We take the binary weights in a neural network as random variables under rounding, and study the distribution propagation over different layers in the neural network.
We propose a quasi neural network to approximate the distribution propagation, which is a neural network with continuous parameters and smooth activation function.
- Score: 33.08636537654596
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Quantized neural networks have drawn a lot of attention as they reduce the
space and computational complexity during the inference. Moreover, there has
been folklore that quantization acts as an implicit regularizer and thus can
improve the generalizability of neural networks, yet no existing work
formalizes this interesting folklore. In this paper, we take the binary weights
in a neural network as random variables under stochastic rounding, and study
the distribution propagation over different layers in the neural network. We
propose a quasi neural network to approximate the distribution propagation,
which is a neural network with continuous parameters and smooth activation
function. We derive the neural tangent kernel (NTK) for this quasi neural
network, and show that the eigenvalue of NTK decays at approximately
exponential rate, which is comparable to that of Gaussian kernel with
randomized scale. This in turn indicates that the Reproducing Kernel Hilbert
Space (RKHS) of a binary weight neural network covers a strict subset of
functions compared with the one with real value weights. We use experiments to
verify that the quasi neural network we proposed can well approximate binary
weight neural network. Furthermore, binary weight neural network gives a lower
generalization gap compared with real value weight neural network, which is
similar to the difference between Gaussian kernel and Laplace kernel.
Related papers
- Novel Kernel Models and Exact Representor Theory for Neural Networks Beyond the Over-Parameterized Regime [52.00917519626559]
This paper presents two models of neural-networks and their training applicable to neural networks of arbitrary width, depth and topology.
We also present an exact novel representor theory for layer-wise neural network training with unregularized gradient descent in terms of a local-extrinsic neural kernel (LeNK)
This representor theory gives insight into the role of higher-order statistics in neural network training and the effect of kernel evolution in neural-network kernel models.
arXiv Detail & Related papers (2024-05-24T06:30:36Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Extrapolation and Spectral Bias of Neural Nets with Hadamard Product: a
Polynomial Net Study [55.12108376616355]
The study on NTK has been devoted to typical neural network architectures, but is incomplete for neural networks with Hadamard products (NNs-Hp)
In this work, we derive the finite-width-K formulation for a special class of NNs-Hp, i.e., neural networks.
We prove their equivalence to the kernel regression predictor with the associated NTK, which expands the application scope of NTK.
arXiv Detail & Related papers (2022-09-16T06:36:06Z) - Consistency of Neural Networks with Regularization [0.0]
This paper proposes the general framework of neural networks with regularization and prove its consistency.
Two types of activation functions: hyperbolic function(Tanh) and rectified linear unit(ReLU) have been taken into consideration.
arXiv Detail & Related papers (2022-06-22T23:33:39Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - Stochastic Neural Networks with Infinite Width are Deterministic [7.07065078444922]
We study neural networks, a main type of neural network in use.
We prove that as the width of an optimized neural network tends to infinity, its predictive variance on the training set decreases to zero.
arXiv Detail & Related papers (2022-01-30T04:52:31Z) - Fourier Neural Networks for Function Approximation [2.840363325289377]
It is proved extensively that neural networks are universal approximators.
It is specifically proved that for a narrow neural network to approximate a function which is otherwise implemented by a deep Neural network, the network take exponentially large number of neurons.
arXiv Detail & Related papers (2021-10-21T09:30:26Z) - Deep Kronecker neural networks: A general framework for neural networks
with adaptive activation functions [4.932130498861987]
We propose a new type of neural networks, Kronecker neural networks (KNNs), that form a general framework for neural networks with adaptive activation functions.
Under suitable conditions, KNNs induce a faster decay of the loss than that by the feed-forward networks.
arXiv Detail & Related papers (2021-05-20T04:54:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.