On the expressivity of deep Heaviside networks
- URL: http://arxiv.org/abs/2505.00110v1
- Date: Wed, 30 Apr 2025 18:25:05 GMT
- Title: On the expressivity of deep Heaviside networks
- Authors: Insung Kong, Juntong Chen, Sophie Langer, Johannes Schmidt-Hieber,
- Abstract summary: We show that deep Heaviside networks (DHNs) have limited expressiveness but that this can be overcome by including either skip connections or neurons with linear activation.<n>We provide lower and upper bounds for the Vapnik-Chervonenkis (VC) dimensions and approximation rates of these network classes.
- Score: 7.374726900469744
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We show that deep Heaviside networks (DHNs) have limited expressiveness but that this can be overcome by including either skip connections or neurons with linear activation. We provide lower and upper bounds for the Vapnik-Chervonenkis (VC) dimensions and approximation rates of these network classes. As an application, we derive statistical convergence rates for DHN fits in the nonparametric regression model.
Related papers
- Lorentzian Residual Neural Networks [15.257990326035694]
We introduce LResNet, a novel Lorentzian residual neural network based on the weighted Lorentzian centroid in the Lorentz model of hyperbolic geometry.<n>Our method enables the efficient integration of residual connections in hyperbolic neural networks while preserving their hierarchical representation capabilities.<n>Our findings highlight the potential of LResNet for building more expressive neural networks in hyperbolic embedding space.
arXiv Detail & Related papers (2024-12-19T09:56:01Z) - NEPENTHE: Entropy-Based Pruning as a Neural Network Depth's Reducer [5.373015313199385]
We propose an eNtropy-basEd Pruning as a nEural Network depTH's rEducer to alleviate deep neural networks' computational burden.
We validate our approach on popular architectures such as MobileNet and Swin-T.
arXiv Detail & Related papers (2024-04-24T09:12:04Z) - Meta-Principled Family of Hyperparameter Scaling Strategies [9.89901717499058]
We calculate the scalings of dynamical observables -- network outputs, neural tangent kernels, and differentials of neural tangent kernels -- for wide and deep neural networks.
We observe that various infinite-width limits examined in the literature correspond to the distinct corners of the interconnected web.
arXiv Detail & Related papers (2022-10-10T18:00:01Z) - Deep Architecture Connectivity Matters for Its Convergence: A
Fine-Grained Analysis [94.64007376939735]
We theoretically characterize the impact of connectivity patterns on the convergence of deep neural networks (DNNs) under gradient descent training.
We show that by a simple filtration on "unpromising" connectivity patterns, we can trim down the number of models to evaluate.
arXiv Detail & Related papers (2022-05-11T17:43:54Z) - Phase diagram of Stochastic Gradient Descent in high-dimensional
two-layer neural networks [22.823904789355495]
We investigate the connection between the mean-fieldhydrodynamic regime and the seminal approach of Saad & Solla.
Our work builds on a deterministic description of rates in high-dimensionals from statistical physics.
arXiv Detail & Related papers (2022-02-01T09:45:07Z) - Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [83.58049517083138]
We consider a two-layer ReLU network trained via gradient descent.
We show that SGD is biased towards a simple solution.
We also provide empirical evidence that knots at locations distinct from the data points might occur.
arXiv Detail & Related papers (2021-11-03T15:14:20Z) - VC dimension of partially quantized neural networks in the
overparametrized regime [8.854725253233333]
We focus on a class of partially quantized networks that we refer to as hyperplane arrangement neural networks (HANNs)
We show that HANNs can have VC dimension significantly smaller than the number of weights, while being highly expressive.
On a panel of 121 UCI datasets, overparametrized HANNs match the performance of state-of-the-art full-precision models.
arXiv Detail & Related papers (2021-10-06T02:02:35Z) - Towards Evaluating and Training Verifiably Robust Neural Networks [81.39994285743555]
We study the relationship between IBP and CROWN, and prove that CROWN is always tighter than IBP when choosing appropriate bounding lines.
We propose a relaxed version of CROWN, linear bound propagation (LBP), that can be used to verify large networks to obtain lower verified errors.
arXiv Detail & Related papers (2021-04-01T13:03:48Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - Implicit Under-Parameterization Inhibits Data-Efficient Deep
Reinforcement Learning [97.28695683236981]
More gradient updates decrease the expressivity of the current value network.
We demonstrate this phenomenon on Atari and Gym benchmarks, in both offline and online RL settings.
arXiv Detail & Related papers (2020-10-27T17:55:16Z) - An Ode to an ODE [78.97367880223254]
We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the group O(d)
This nested system of two flows provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem.
arXiv Detail & Related papers (2020-06-19T22:05:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.