A study of local optima for learning feature interactions using neural
networks
- URL: http://arxiv.org/abs/2002.04322v1
- Date: Tue, 11 Feb 2020 11:38:45 GMT
- Title: A study of local optima for learning feature interactions using neural
networks
- Authors: Yangzi Guo, Adrian Barbu
- Abstract summary: We study the datastarved regime where a NN is trained on a relatively small amount of training data.
We experimentally observed that the cross-entropy loss function on XOR-like data has many non-equivalent local optima.
We show that the performance of a NN on real datasets can be improved using pruning.
- Score: 4.94950858749529
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many fields such as bioinformatics, high energy physics, power
distribution, etc., it is desirable to learn non-linear models where a small
number of variables are selected and the interaction between them is explicitly
modeled to predict the response. In principle, neural networks (NNs) could
accomplish this task since they can model non-linear feature interactions very
well. However, NNs require large amounts of training data to have a good
generalization. In this paper we study the datastarved regime where a NN is
trained on a relatively small amount of training data. For that purpose we
study feature selection for NNs, which is known to improve generalization for
linear models. As an extreme case of data with feature selection and feature
interactions we study the XOR-like data with irrelevant variables. We
experimentally observed that the cross-entropy loss function on XOR-like data
has many non-equivalent local optima, and the number of local optima grows
exponentially with the number of irrelevant variables. To deal with the local
minima and for feature selection we propose a node pruning and feature
selection algorithm that improves the capability of NNs to find better local
minima even when there are irrelevant variables. Finally, we show that the
performance of a NN on real datasets can be improved using pruning, obtaining
compact networks on a small number of features, with good prediction and
interpretability.
Related papers
- Unveiling the Power of Sparse Neural Networks for Feature Selection [60.50319755984697]
Sparse Neural Networks (SNNs) have emerged as powerful tools for efficient feature selection.
We show that SNNs trained with dynamic sparse training (DST) algorithms can achieve, on average, more than $50%$ memory and $55%$ FLOPs reduction.
Our findings show that feature selection with SNNs trained with DST algorithms can achieve, on average, more than $50%$ memory and $55%$ FLOPs reduction.
arXiv Detail & Related papers (2024-08-08T16:48:33Z) - Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - Towards a Phenomenological Understanding of Neural Networks: Data [1.2985510601654955]
Theory of neural networks (NNs) built upon collective variables would provide scientists with the tools to better understand the learning process at every stage.
In this work, we introduce two such variables, the entropy and the trace of the empirical neural tangent kernel (NTK) built on the training data passed to the model.
We find correlation between the starting entropy, the trace of the NTK, and the generalization of the model computed after training is complete.
arXiv Detail & Related papers (2023-05-01T18:00:01Z) - Supervised Feature Selection with Neuron Evolution in Sparse Neural
Networks [17.12834153477201]
We propose a novel resource-efficient supervised feature selection method using sparse neural networks.
By gradually pruning the uninformative features from the input layer of a sparse neural network trained from scratch, NeuroFS derives an informative subset of features efficiently.
NeuroFS achieves the highest ranking-based score among the considered state-of-the-art supervised feature selection models.
arXiv Detail & Related papers (2023-03-10T17:09:55Z) - Functional Neural Networks: Shift invariant models for functional data
with applications to EEG classification [0.0]
We introduce a new class of neural networks that are shift invariant and preserve smoothness of the data: functional neural networks (FNNs)
For this, we use methods from functional data analysis (FDA) to extend multi-layer perceptrons and convolutional neural networks to functional data.
We show that the models outperform a benchmark model from FDA in terms of accuracy and successfully use FNNs to classify electroencephalography (EEG) data.
arXiv Detail & Related papers (2023-01-14T09:41:21Z) - Contextual HyperNetworks for Novel Feature Adaptation [43.49619456740745]
Contextual HyperNetwork (CHN) generates parameters for extending the base model to a new feature.
At prediction time, the CHN requires only a single forward pass through a neural network, yielding a significant speed-up.
We show that this system obtains improved few-shot learning performance for novel features over existing imputation and meta-learning baselines.
arXiv Detail & Related papers (2021-04-12T23:19:49Z) - Rank-R FNN: A Tensor-Based Learning Model for High-Order Data
Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters.
First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension.
We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z) - NL-CNN: A Resources-Constrained Deep Learning Model based on Nonlinear
Convolution [0.0]
A novel convolution neural network model, abbreviated NL-CNN, is proposed, where nonlinear convolution is emulated in a cascade of convolution + nonlinearity layers.
Performance evaluation for several widely known datasets is provided, showing several relevant features.
arXiv Detail & Related papers (2021-01-30T13:38:42Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Neural Additive Models: Interpretable Machine Learning with Neural Nets [77.66871378302774]
Deep neural networks (DNNs) are powerful black-box predictors that have achieved impressive performance on a wide variety of tasks.
We propose Neural Additive Models (NAMs) which combine some of the expressivity of DNNs with the inherent intelligibility of generalized additive models.
NAMs learn a linear combination of neural networks that each attend to a single input feature.
arXiv Detail & Related papers (2020-04-29T01:28:32Z) - Approximation and Non-parametric Estimation of ResNet-type Convolutional
Neural Networks [52.972605601174955]
We show a ResNet-type CNN can attain the minimax optimal error rates in important function classes.
We derive approximation and estimation error rates of the aformentioned type of CNNs for the Barron and H"older classes.
arXiv Detail & Related papers (2019-03-24T19:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.