Training a neural netwok for data reduction and better generalization
- URL: http://arxiv.org/abs/2411.17180v1
- Date: Tue, 26 Nov 2024 07:41:15 GMT
- Title: Training a neural netwok for data reduction and better generalization
- Authors: Sylvain Sardy, Maxime van Cutsem, Xiaoyu Ma,
- Abstract summary: The motivation for sparse learners is to compress the inputs (features) by selecting only the ones needed for good generalization.
We show a remarkable phase transition from ignoring irrelevant features to retrieving them well as good thanks to the choice of artificial features.
This approach can be seen as a form of sensing for compressed features to interpret high-dimensional data into a compact, interpretable subset of meaningful penalties.
- Score: 7.545668088790516
- License:
- Abstract: The motivation for sparse learners is to compress the inputs (features) by selecting only the ones needed for good generalization. Linear models with LASSO-type regularization achieve this by setting the weights of irrelevant features to zero, effectively identifying and ignoring them. In artificial neural networks, this selective focus can be achieved by pruning the input layer. Given a cost function enhanced with a sparsity-promoting penalty, our proposal selects a regularization term $\lambda$ (without the use of cross-validation or a validation set) that creates a local minimum in the cost function at the origin where no features are selected. This local minimum acts as a baseline, meaning that if there is no strong enough signal to justify a feature inclusion, the local minimum remains at zero with a high prescribed probability. The method is flexible, applying to complex models ranging from shallow to deep artificial neural networks and supporting various cost functions and sparsity-promoting penalties. We empirically show a remarkable phase transition in the probability of retrieving the relevant features, as well as good generalization thanks to the choice of $\lambda$, the non-convex penalty and the optimization scheme developed. This approach can be seen as a form of compressed sensing for complex models, allowing us to distill high-dimensional data into a compact, interpretable subset of meaningful features.
Related papers
- Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - Approximation with Random Shallow ReLU Networks with Applications to Model Reference Adaptive Control [0.0]
We show that ReLU networks with randomly generated weights and biases achieve $L_infty$ error of $O(m-1/2)$ with high probability.
We show how the result can be used to get approximations of required accuracy in a model reference adaptive control application.
arXiv Detail & Related papers (2024-03-25T19:39:17Z) - A Pseudo-Semantic Loss for Autoregressive Models with Logical
Constraints [87.08677547257733]
Neuro-symbolic AI bridges the gap between purely symbolic and neural approaches to learning.
We show how to maximize the likelihood of a symbolic constraint w.r.t the neural network's output distribution.
We also evaluate our approach on Sudoku and shortest-path prediction cast as autoregressive generation.
arXiv Detail & Related papers (2023-12-06T20:58:07Z) - Sparse-Input Neural Network using Group Concave Regularization [10.103025766129006]
Simultaneous feature selection and non-linear function estimation are challenging in neural networks.
We propose a framework of sparse-input neural networks using group concave regularization for feature selection in both low-dimensional and high-dimensional settings.
arXiv Detail & Related papers (2023-07-01T13:47:09Z) - Neural Greedy Pursuit for Feature Selection [72.4121881681861]
We propose a greedy algorithm to select $N$ important features among $P$ input features for a non-linear prediction problem.
We use neural networks as predictors in the algorithm to compute the loss.
arXiv Detail & Related papers (2022-07-19T16:39:16Z) - Variable Bitrate Neural Fields [75.24672452527795]
We present a dictionary method for compressing feature grids, reducing their memory consumption by up to 100x.
We formulate the dictionary optimization as a vector-quantized auto-decoder problem which lets us learn end-to-end discrete neural representations in a space where no direct supervision is available.
arXiv Detail & Related papers (2022-06-15T17:58:34Z) - Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption.
They can suffer from ill-posedness and convergence instability.
This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z) - Sample-Efficient Reinforcement Learning Is Feasible for Linearly
Realizable MDPs with Limited Revisiting [60.98700344526674]
Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning.
In this paper, we investigate a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states in a controlled and infrequent manner.
We develop an algorithm tailored to this setting, achieving a sample complexity that scales practicallyly with the feature dimension, the horizon, and the inverse sub-optimality gap, but not the size of the state/action space.
arXiv Detail & Related papers (2021-05-17T17:22:07Z) - Embedded methods for feature selection in neural networks [0.0]
Black box models like neural networks negatively affect the interpretability, generalizability, and the training time of these models.
I propose two integrated approaches for feature selection that can be incorporated directly into the parameter learning.
I benchmarked both the methods against Permutation Feature Importance (PFI) - a general-purpose feature ranking method and a random baseline.
arXiv Detail & Related papers (2020-10-12T16:33:46Z) - Binary Stochastic Filtering: feature selection and beyond [0.0]
This work aims at extending the neural network with ability to automatically select features by rethinking how the sparsity regularization can be used.
The proposed method has demonstrated superior efficiency when compared to a few classical methods, achieved with minimal or no computational overhead.
arXiv Detail & Related papers (2020-07-08T06:57:10Z) - What needles do sparse neural networks find in nonlinear haystacks [0.0]
A sparsity inducing penalty in artificial neural networks (ANNs) avoids over-fitting, especially in situations where noise is high and the training set is small.
For linear models, such an approach provably also recovers the important features with high probability in regimes for a well-chosen penalty parameter.
We perform a set of comprehensive Monte Carlo simulations on a simple model, and the numerical results show the effectiveness of the proposed approach.
arXiv Detail & Related papers (2020-06-07T04:46:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.