What needles do sparse neural networks find in nonlinear haystacks
- URL: http://arxiv.org/abs/2006.04041v1
- Date: Sun, 7 Jun 2020 04:46:55 GMT
- Title: What needles do sparse neural networks find in nonlinear haystacks
- Authors: Sylvain Sardy, Nicolas W Hengartner, Nikolai Bonenko, Yen Ting Lin
- Abstract summary: A sparsity inducing penalty in artificial neural networks (ANNs) avoids over-fitting, especially in situations where noise is high and the training set is small.
For linear models, such an approach provably also recovers the important features with high probability in regimes for a well-chosen penalty parameter.
We perform a set of comprehensive Monte Carlo simulations on a simple model, and the numerical results show the effectiveness of the proposed approach.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Using a sparsity inducing penalty in artificial neural networks (ANNs) avoids
over-fitting, especially in situations where noise is high and the training set
is small in comparison to the number of features. For linear models, such an
approach provably also recovers the important features with high probability in
regimes for a well-chosen penalty parameter. The typical way of setting the
penalty parameter is by splitting the data set and performing the
cross-validation, which is (1) computationally expensive and (2) not desirable
when the data set is already small to be further split (for example,
whole-genome sequence data). In this study, we establish the theoretical
foundation to select the penalty parameter without cross-validation based on
bounding with a high probability the infinite norm of the gradient of the loss
function at zero under the zero-feature assumption. Our approach is a
generalization of the universal threshold of Donoho and Johnstone (1994) to
nonlinear ANN learning. We perform a set of comprehensive Monte Carlo
simulations on a simple model, and the numerical results show the effectiveness
of the proposed approach.
Related papers
- Training a neural netwok for data reduction and better generalization [7.545668088790516]
The motivation for sparse learners is to compress the inputs (features) by selecting only the ones needed for good generalization.
We show a remarkable phase transition from ignoring irrelevant features to retrieving them well as good thanks to the choice of artificial features.
This approach can be seen as a form of sensing for compressed features to interpret high-dimensional data into a compact, interpretable subset of meaningful penalties.
arXiv Detail & Related papers (2024-11-26T07:41:15Z) - The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - A Pseudo-Semantic Loss for Autoregressive Models with Logical
Constraints [87.08677547257733]
Neuro-symbolic AI bridges the gap between purely symbolic and neural approaches to learning.
We show how to maximize the likelihood of a symbolic constraint w.r.t the neural network's output distribution.
We also evaluate our approach on Sudoku and shortest-path prediction cast as autoregressive generation.
arXiv Detail & Related papers (2023-12-06T20:58:07Z) - A Metalearned Neural Circuit for Nonparametric Bayesian Inference [4.767884267554628]
Most applications of machine learning to classification assume a closed set of balanced classes.
This is at odds with the real world, where class occurrence statistics often follow a long-tailed power-law distribution.
We present a method for extracting the inductive bias from a nonparametric Bayesian model and transferring it to an artificial neural network.
arXiv Detail & Related papers (2023-11-24T16:43:17Z) - Sparse-Input Neural Network using Group Concave Regularization [10.103025766129006]
Simultaneous feature selection and non-linear function estimation are challenging in neural networks.
We propose a framework of sparse-input neural networks using group concave regularization for feature selection in both low-dimensional and high-dimensional settings.
arXiv Detail & Related papers (2023-07-01T13:47:09Z) - A phase transition for finding needles in nonlinear haystacks with LASSO
artificial neural networks [1.5381930379183162]
A ANN learner exhibits a phase transition in the probability of retrieving the needles.
We propose a warm-start sparsity inducing algorithm to solve the high-validation, non-differentiable and non-differentiable optimization problem.
arXiv Detail & Related papers (2022-01-21T11:39:04Z) - Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption.
They can suffer from ill-posedness and convergence instability.
This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Convergence rates for gradient descent in the training of
overparameterized artificial neural networks with biases [3.198144010381572]
In recent years, artificial neural networks have developed into a powerful tool for dealing with a multitude of problems for which classical solution approaches.
It is still unclear why randomly gradient descent algorithms reach their limits.
arXiv Detail & Related papers (2021-02-23T18:17:47Z) - Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve
Optimism, Embrace Virtual Curvature [61.22680308681648]
We show that global convergence is statistically intractable even for one-layer neural net bandit with a deterministic reward.
For both nonlinear bandit and RL, the paper presents a model-based algorithm, Virtual Ascent with Online Model Learner (ViOL)
arXiv Detail & Related papers (2021-02-08T12:41:56Z) - Slice Sampling for General Completely Random Measures [74.24975039689893]
We present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables.
The efficacy of the proposed algorithm is evaluated on several popular nonparametric models.
arXiv Detail & Related papers (2020-06-24T17:53:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.