Nonconvex regularization for sparse neural networks
- URL: http://arxiv.org/abs/2004.11515v2
- Date: Tue, 31 May 2022 16:18:39 GMT
- Title: Nonconvex regularization for sparse neural networks
- Authors: Konstantin Pieper and Armenak Petrosyan
- Abstract summary: We show that a non regularization method is investigated in the context of shallow ReLU networks.
We show that the network approximation guarantees existing bounds on the size for finite data are maintained.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convex $\ell_1$ regularization using an infinite dictionary of neurons has
been suggested for constructing neural networks with desired approximation
guarantees, but can be affected by an arbitrary amount of over-parametrization.
This can lead to a loss of sparsity and result in networks with too many active
neurons for the given data, in particular if the number of data samples is
large. As a remedy, in this paper, a nonconvex regularization method is
investigated in the context of shallow ReLU networks: We prove that in contrast
to the convex approach, any resulting (locally optimal) network is finite even
in the presence of infinite data (i.e., if the data distribution is known and
the limiting case of infinite samples is considered). Moreover, we show that
approximation guarantees and existing bounds on the network size for finite
data are maintained.
Related papers
- Residual Random Neural Networks [0.0]
Single-layer feedforward neural network with random weights is a recurring motif in the neural networks literature.
We show that one can obtain good classification results even if the number of hidden neurons has the same order of magnitude as the dimensionality of the data samples.
arXiv Detail & Related papers (2024-10-25T22:00:11Z) - On the Convergence of Locally Adaptive and Scalable Diffusion-Based Sampling Methods for Deep Bayesian Neural Network Posteriors [2.3265565167163906]
Bayesian neural networks are a promising approach for modeling uncertainties in deep neural networks.
generating samples from the posterior distribution of neural networks is a major challenge.
One advance in that direction would be the incorporation of adaptive step sizes into Monte Carlo Markov chain sampling algorithms.
In this paper, we demonstrate that these methods can have a substantial bias in the distribution they sample, even in the limit of vanishing step sizes and at full batch size.
arXiv Detail & Related papers (2024-03-13T15:21:14Z) - Mind the spikes: Benign overfitting of kernels and neural networks in fixed dimension [17.96183484063563]
We show that the smoothness of the estimators, and not the dimension, is the key to benign overfitting.
We translate our results to wide neural networks.
Our experiments verify that such neural networks, while overfitting, can indeed generalize well even on low-dimensional data sets.
arXiv Detail & Related papers (2023-05-23T13:56:29Z) - Semantic Strengthening of Neuro-Symbolic Learning [85.6195120593625]
Neuro-symbolic approaches typically resort to fuzzy approximations of a probabilistic objective.
We show how to compute this efficiently for tractable circuits.
We test our approach on three tasks: predicting a minimum-cost path in Warcraft, predicting a minimum-cost perfect matching, and solving Sudoku puzzles.
arXiv Detail & Related papers (2023-02-28T00:04:22Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks [19.216784367141972]
We study the problem of estimating an unknown function from noisy data using shallow (single-hidden layer) ReLU neural networks.
We quantify the performance of these neural network estimators when the data-generating function belongs to the space of functions of second-order bounded variation in the Radon domain.
arXiv Detail & Related papers (2021-09-18T05:56:06Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Finite Versus Infinite Neural Networks: an Empirical Study [69.07049353209463]
kernel methods outperform fully-connected finite-width networks.
Centered and ensembled finite networks have reduced posterior variance.
Weight decay and the use of a large learning rate break the correspondence between finite and infinite networks.
arXiv Detail & Related papers (2020-07-31T01:57:47Z) - ESPN: Extremely Sparse Pruned Networks [50.436905934791035]
We show that a simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks.
Our algorithm represents a hybrid approach between single shot network pruning methods and Lottery-Ticket type approaches.
arXiv Detail & Related papers (2020-06-28T23:09:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.