Related papers: A phase transition for finding needles in nonlinear haystacks with LASSO artificial neural networks

A phase transition for finding needles in nonlinear haystacks with LASSO artificial neural networks

URL: http://arxiv.org/abs/2201.08652v1
Date: Fri, 21 Jan 2022 11:39:04 GMT
Title: A phase transition for finding needles in nonlinear haystacks with LASSO artificial neural networks
Authors: Xiaoyu Ma, Sylvain Sardy, Nick Hengartner, Nikolai Bobenko, Yen Ting Lin
Abstract summary: A ANN learner exhibits a phase transition in the probability of retrieving the needles. We propose a warm-start sparsity inducing algorithm to solve the high-validation, non-differentiable and non-differentiable optimization problem.
Score: 1.5381930379183162
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: To fit sparse linear associations, a LASSO sparsity inducing penalty with a single hyperparameter provably allows to recover the important features (needles) with high probability in certain regimes even if the sample size is smaller than the dimension of the input vector (haystack). More recently learners known as artificial neural networks (ANN) have shown great successes in many machine learning tasks, in particular fitting nonlinear associations. Small learning rate, stochastic gradient descent algorithm and large training set help to cope with the explosion in the number of parameters present in deep neural networks. Yet few ANN learners have been developed and studied to find needles in nonlinear haystacks. Driven by a single hyperparameter, our ANN learner, like for sparse linear associations, exhibits a phase transition in the probability of retrieving the needles, which we do not observe with other ANN learners. To select our penalty parameter, we generalize the universal threshold of Donoho and Johnstone (1994) which is a better rule than the conservative (too many false detections) and expensive cross-validation. In the spirit of simulated annealing, we propose a warm-start sparsity inducing algorithm to solve the high-dimensional, non-convex and non-differentiable optimization problem. We perform precise Monte Carlo simulations to show the effectiveness of our approach.

Related papers

Dendrites endow artificial neural networks with accurate, robust and parameter-efficient learning [0.0]
We show that a new ANN architecture incorporates the structured connectivity and restricted sampling properties of biological dendrites. We find that dendritic ANNs are more robust to overfitting and outperform traditional ANNs on several image classification tasks.
arXiv Detail & Related papers (2024-04-04T11:22:58Z)
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck [35.6883212537938]
We consider offline sparse parity learning, a supervised classification problem which admits a statistical query lower bound for gradient-based training of a multilayer perceptron. We show, theoretically and experimentally, that sparse initialization and increasing network width yield significant improvements in sample efficiency in this setting. We also show that the synthetic sparse parity task can be useful as a proxy for real problems requiring axis-aligned feature learning.
arXiv Detail & Related papers (2023-09-07T15:52:48Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models. We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers. A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z)
Revisiting Gaussian Neurons for Online Clustering with Unknown Number of Clusters [0.0]
A novel local learning rule is presented that performs online clustering with a maximum limit of the number of cluster to be found. The experimental results demonstrate stability in the learned parameters across a large number of training samples.
arXiv Detail & Related papers (2022-05-02T14:01:40Z)
Exploring Linear Feature Disentanglement For Neural Networks [63.20827189693117]
Non-linear activation functions, e.g., Sigmoid, ReLU, and Tanh, have achieved great success in neural networks (NNs) Due to the complex non-linear characteristic of samples, the objective of those activation functions is to project samples from their original feature space to a linear separable feature space. This phenomenon ignites our interest in exploring whether all features need to be transformed by all non-linear functions in current typical NNs.
arXiv Detail & Related papers (2022-03-22T13:09:17Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
Convergence rates for gradient descent in the training of overparameterized artificial neural networks with biases [3.198144010381572]
In recent years, artificial neural networks have developed into a powerful tool for dealing with a multitude of problems for which classical solution approaches. It is still unclear why randomly gradient descent algorithms reach their limits.
arXiv Detail & Related papers (2021-02-23T18:17:47Z)
Measuring Model Complexity of Neural Networks with Curve Activation Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function. We experimentally explore the training process of neural networks and detect overfitting. We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z)
What needles do sparse neural networks find in nonlinear haystacks [0.0]
A sparsity inducing penalty in artificial neural networks (ANNs) avoids over-fitting, especially in situations where noise is high and the training set is small. For linear models, such an approach provably also recovers the important features with high probability in regimes for a well-chosen penalty parameter. We perform a set of comprehensive Monte Carlo simulations on a simple model, and the numerical results show the effectiveness of the proposed approach.
arXiv Detail & Related papers (2020-06-07T04:46:55Z)
Towards Efficient Processing and Learning with Spikes: New Approaches for Multi-Spike Learning [59.249322621035056]
We propose two new multi-spike learning rules which demonstrate better performance over other baselines on various tasks. In the feature detection task, we re-examine the ability of unsupervised STDP with its limitations being presented. Our proposed learning rules can reliably solve the task over a wide range of conditions without specific constraints being applied.
arXiv Detail & Related papers (2020-05-02T06:41:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.