Convexifying Sparse Interpolation with Infinitely Wide Neural Networks:
An Atomic Norm Approach
- URL: http://arxiv.org/abs/2007.08009v1
- Date: Wed, 15 Jul 2020 21:40:51 GMT
- Title: Convexifying Sparse Interpolation with Infinitely Wide Neural Networks:
An Atomic Norm Approach
- Authors: Akshay Kumar and Jarvis Haupt
- Abstract summary: This work examines the problem of exact data via sparse (neuron count), infinitely wide, single hidden layer neural networks with leaky rectified linear unit activations.
We derive simple characterizations of the convex hulls of the corresponding atomic sets for this problem under several different constraints on the weights and biases of the network.
A modest extension of our proposed framework to a binary classification problem is also presented.
- Score: 4.380224449592902
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work examines the problem of exact data interpolation via sparse (neuron
count), infinitely wide, single hidden layer neural networks with leaky
rectified linear unit activations. Using the atomic norm framework of
[Chandrasekaran et al., 2012], we derive simple characterizations of the convex
hulls of the corresponding atomic sets for this problem under several different
constraints on the weights and biases of the network, thus obtaining equivalent
convex formulations for these problems. A modest extension of our proposed
framework to a binary classification problem is also presented. We explore the
efficacy of the resulting formulations experimentally, and compare with
networks trained via gradient descent.
Related papers
- Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Norm-based Generalization Bounds for Compositionally Sparse Neural
Networks [11.987589603961622]
We prove generalization bounds for multilayered sparse ReLU neural networks, including convolutional neural networks.
Taken together, these results suggest that compositional sparsity of the underlying target function is critical to the success of deep neural networks.
arXiv Detail & Related papers (2023-01-28T00:06:22Z) - The Sample Complexity of One-Hidden-Layer Neural Networks [57.6421258363243]
We study a class of scalar-valued one-hidden-layer networks, and inputs bounded in Euclidean norm.
We prove that controlling the spectral norm of the hidden layer weight matrix is insufficient to get uniform convergence guarantees.
We analyze two important settings where a mere spectral norm control turns out to be sufficient.
arXiv Detail & Related papers (2022-02-13T07:12:02Z) - Non-Vacuous Generalisation Bounds for Shallow Neural Networks [5.799808780731661]
We focus on a specific class of shallow neural networks with a single hidden layer.
We derive new generalisation bounds through the PAC-Bayesian theory.
Our bounds are empirically non-vacuous when the network is trained with vanilla gradient descent on MNIST and Fashion-MNIST.
arXiv Detail & Related papers (2022-02-03T14:59:51Z) - Linear approximability of two-layer neural networks: A comprehensive
analysis based on spectral decay [4.042159113348107]
We first consider the case of single neuron and show that the linear approximability, quantified by the Kolmogorov width, is controlled by the eigenvalue decay of an associate kernel.
We show that similar results also hold for two-layer neural networks.
arXiv Detail & Related papers (2021-08-10T23:30:29Z) - Deep Networks Provably Classify Data on Curves [12.309532551321334]
We study a model problem that uses a deep fully-connected neural network to classify data drawn from two disjoint smooth curves on the unit sphere.
We prove that when (i) the network depth is large to certain properties that set the difficulty of the problem and (ii) the network width and number of samples is intrinsic in the relative depth, randomly-d gradient descent quickly learns to correctly classify all points on the two curves with high probability.
arXiv Detail & Related papers (2021-07-29T20:40:04Z) - Eliminating Multicollinearity Issues in Neural Network Ensembles:
Incremental, Negatively Correlated, Optimal Convex Blending [0.2294014185517203]
We introduce an incremental algorithm that constructs an aggregate regressor, using an ensemble of neural networks.
We optimally blend the aggregate regressor with a newly trained neural network under a convexity constraint.
Under this framework, collinearity issues do not arise at all, rendering so the method both accurate and robust.
arXiv Detail & Related papers (2021-04-30T01:32:08Z) - Multipole Graph Neural Operator for Parametric Partial Differential
Equations [57.90284928158383]
One of the main challenges in using deep learning-based methods for simulating physical systems is formulating physics-based data.
We propose a novel multi-level graph neural network framework that captures interaction at all ranges with only linear complexity.
Experiments confirm our multi-graph network learns discretization-invariant solution operators to PDEs and can be evaluated in linear time.
arXiv Detail & Related papers (2020-06-16T21:56:22Z) - Convex Geometry and Duality of Over-parameterized Neural Networks [70.15611146583068]
We develop a convex analytic approach to analyze finite width two-layer ReLU networks.
We show that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set.
In higher dimensions, we show that the training problem can be cast as a finite dimensional convex problem with infinitely many constraints.
arXiv Detail & Related papers (2020-02-25T23:05:33Z) - Beyond Dropout: Feature Map Distortion to Regularize Deep Neural
Networks [107.77595511218429]
In this paper, we investigate the empirical Rademacher complexity related to intermediate layers of deep neural networks.
We propose a feature distortion method (Disout) for addressing the aforementioned problem.
The superiority of the proposed feature map distortion for producing deep neural network with higher testing performance is analyzed and demonstrated.
arXiv Detail & Related papers (2020-02-23T13:59:13Z) - Revealing the Structure of Deep Neural Networks via Convex Duality [70.15611146583068]
We study regularized deep neural networks (DNNs) and introduce a convex analytic framework to characterize the structure of hidden layers.
We show that a set of optimal hidden layer weights for a norm regularized training problem can be explicitly found as the extreme points of a convex set.
We apply the same characterization to deep ReLU networks with whitened data and prove the same weight alignment holds.
arXiv Detail & Related papers (2020-02-22T21:13:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.