Provable Identifiability of Two-Layer ReLU Neural Networks via LASSO
Regularization
- URL: http://arxiv.org/abs/2305.04267v1
- Date: Sun, 7 May 2023 13:05:09 GMT
- Title: Provable Identifiability of Two-Layer ReLU Neural Networks via LASSO
Regularization
- Authors: Gen Li, Ganghua Wang, Jie Ding
- Abstract summary: The territory of LASSO is extended to two-layer ReLU neural networks, a fashionable and powerful nonlinear regression model.
We show that the LASSO estimator can stably reconstruct the neural network and identify $mathcalSstar$ when the number of samples scales logarithmically.
Our theory lies in an extended Restricted Isometry Property (RIP)-based analysis framework for two-layer ReLU neural networks.
- Score: 15.517787031620864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: LASSO regularization is a popular regression tool to enhance the prediction
accuracy of statistical models by performing variable selection through the
$\ell_1$ penalty, initially formulated for the linear model and its variants.
In this paper, the territory of LASSO is extended to two-layer ReLU neural
networks, a fashionable and powerful nonlinear regression model. Specifically,
given a neural network whose output $y$ depends only on a small subset of input
$\boldsymbol{x}$, denoted by $\mathcal{S}^{\star}$, we prove that the LASSO
estimator can stably reconstruct the neural network and identify
$\mathcal{S}^{\star}$ when the number of samples scales logarithmically with
the input dimension. This challenging regime has been well understood for
linear models while barely studied for neural networks. Our theory lies in an
extended Restricted Isometry Property (RIP)-based analysis framework for
two-layer ReLU neural networks, which may be of independent interest to other
LASSO or neural network settings. Based on the result, we advocate a neural
network-based variable selection method. Experiments on simulated and
real-world datasets show promising performance of the variable selection
approach compared with existing techniques.
Related papers
- Sparse-Input Neural Network using Group Concave Regularization [10.103025766129006]
Simultaneous feature selection and non-linear function estimation are challenging in neural networks.
We propose a framework of sparse-input neural networks using group concave regularization for feature selection in both low-dimensional and high-dimensional settings.
arXiv Detail & Related papers (2023-07-01T13:47:09Z) - The Contextual Lasso: Sparse Linear Models via Deep Neural Networks [5.607237982617641]
We develop a new statistical estimator that fits a sparse linear model to the explanatory features such that the sparsity pattern and coefficients vary as a function of the contextual features.
An extensive suite of experiments on real and synthetic data suggests that the learned models, which remain highly transparent, can be sparser than the regular lasso.
arXiv Detail & Related papers (2023-02-02T05:00:29Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - Nonparametric regression with modified ReLU networks [77.34726150561087]
We consider regression estimation with modified ReLU neural networks in which network weight matrices are first modified by a function $alpha$ before being multiplied by input vectors.
arXiv Detail & Related papers (2022-07-17T21:46:06Z) - Hierarchical autoregressive neural networks for statistical systems [0.05156484100374058]
We propose a hierarchical association of physical degrees of freedom, for instance spins, to neurons which replaces it with the scaling with the linear extent $L$ of the system.
We demonstrate our approach on the two-dimensional Ising model by simulating lattices of various sizes up to $128 times 128$ spins, with time benchmarks reaching lattices of size $512 times 512$.
arXiv Detail & Related papers (2022-03-21T13:55:53Z) - Neural Capacitance: A New Perspective of Neural Network Selection via
Edge Dynamics [85.31710759801705]
Current practice requires expensive computational costs in model training for performance prediction.
We propose a novel framework for neural network selection by analyzing the governing dynamics over synaptic connections (edges) during training.
Our framework is built on the fact that back-propagation during neural network training is equivalent to the dynamical evolution of synaptic connections.
arXiv Detail & Related papers (2022-01-11T20:53:15Z) - A Bayesian Perspective on Training Speed and Model Selection [51.15664724311443]
We show that a measure of a model's training speed can be used to estimate its marginal likelihood.
We verify our results in model selection tasks for linear models and for the infinite-width limit of deep neural networks.
Our results suggest a promising new direction towards explaining why neural networks trained with gradient descent are biased towards functions that generalize well.
arXiv Detail & Related papers (2020-10-27T17:56:14Z) - Measurement error models: from nonparametric methods to deep neural
networks [3.1798318618973362]
We propose an efficient neural network design for estimating measurement error models.
We use a fully connected feed-forward neural network to approximate the regression function $f(x)$.
We conduct an extensive numerical study to compare the neural network approach with classical nonparametric methods.
arXiv Detail & Related papers (2020-07-15T06:05:37Z) - Regularization Matters: A Nonparametric Perspective on Overparametrized
Neural Network [20.132432350255087]
Overparametrized neural networks trained by tangent descent (GD) can provably overfit any training data.
This paper studies how well overparametrized neural networks can recover the true target function in the presence of random noises.
arXiv Detail & Related papers (2020-07-06T01:02:23Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Measuring Model Complexity of Neural Networks with Curve Activation
Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function.
We experimentally explore the training process of neural networks and detect overfitting.
We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.