VC dimension of partially quantized neural networks in the
overparametrized regime
- URL: http://arxiv.org/abs/2110.02456v1
- Date: Wed, 6 Oct 2021 02:02:35 GMT
- Title: VC dimension of partially quantized neural networks in the
overparametrized regime
- Authors: Yutong Wang, Clayton D. Scott
- Abstract summary: We focus on a class of partially quantized networks that we refer to as hyperplane arrangement neural networks (HANNs)
We show that HANNs can have VC dimension significantly smaller than the number of weights, while being highly expressive.
On a panel of 121 UCI datasets, overparametrized HANNs match the performance of state-of-the-art full-precision models.
- Score: 8.854725253233333
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vapnik-Chervonenkis (VC) theory has so far been unable to explain the small
generalization error of overparametrized neural networks. Indeed, existing
applications of VC theory to large networks obtain upper bounds on VC dimension
that are proportional to the number of weights, and for a large class of
networks, these upper bound are known to be tight. In this work, we focus on a
class of partially quantized networks that we refer to as hyperplane
arrangement neural networks (HANNs). Using a sample compression analysis, we
show that HANNs can have VC dimension significantly smaller than the number of
weights, while being highly expressive. In particular, empirical risk
minimization over HANNs in the overparametrized regime achieves the minimax
rate for classification with Lipschitz posterior class probability. We further
demonstrate the expressivity of HANNs empirically. On a panel of 121 UCI
datasets, overparametrized HANNs match the performance of state-of-the-art
full-precision models.
Related papers
- Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning.
Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z) - Generalization and Estimation Error Bounds for Model-based Neural
Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks.
We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Dense Hebbian neural networks: a replica symmetric picture of supervised
learning [4.133728123207142]
We consider dense, associative neural-networks trained by a teacher with supervision.
We investigate their computational capabilities analytically, via statistical-mechanics of spin glasses, and numerically, via Monte Carlo simulations.
arXiv Detail & Related papers (2022-11-25T13:37:47Z) - Lipschitz Bound Analysis of Neural Networks [0.0]
Lipschitz Bound Estimation is an effective method of regularizing deep neural networks to make them robust against adversarial attacks.
In this paper, we highlight the significant gap in obtaining a non-trivial Lipschitz bound certificate for Convolutional Neural Networks (CNNs)
We also show that unrolling Convolutional layers or Toeplitz matrices can be employed to convert Convolutional Neural Networks (CNNs) to a Fully Connected Network.
arXiv Detail & Related papers (2022-07-14T23:40:22Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - Tensor-Train Networks for Learning Predictive Modeling of
Multidimensional Data [0.0]
A promising strategy is based on tensor networks, which have been very successful in physical and chemical applications.
We show that the weights of a multidimensional regression model can be learned by means of tensor networks with the aim of performing a powerful compact representation.
An algorithm based on alternating least squares has been proposed for approximating the weights in TT-format with a reduction of computational power.
arXiv Detail & Related papers (2021-01-22T16:14:38Z) - Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms.
We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z) - Mixed-Precision Quantized Neural Network with Progressively Decreasing
Bitwidth For Image Classification and Object Detection [21.48875255723581]
A mixed-precision quantized neural network with progressively ecreasing bitwidth is proposed to improve the trade-off between accuracy and compression.
Experiments on typical network architectures and benchmark datasets demonstrate that the proposed method could achieve better or comparable results.
arXiv Detail & Related papers (2019-12-29T14:11:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.