Training a neural netwok for data reduction and better generalization
- URL: http://arxiv.org/abs/2411.17180v2
- Date: Sun, 01 Dec 2024 11:10:31 GMT
- Title: Training a neural netwok for data reduction and better generalization
- Authors: Sylvain Sardy, Maxime van Cutsem, Xiaoyu Ma,
- Abstract summary: The motivation for sparse learners is to compress the inputs (features) by selecting only the ones needed for good generalization.<n>We show a remarkable phase transition from ignoring irrelevant features to retrieving them well as good thanks to the choice of artificial features.<n>This approach can be seen as a form of sensing for compressed features to interpret high-dimensional data into a compact, interpretable subset of meaningful penalties.
- Score: 7.545668088790516
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The motivation for sparse learners is to compress the inputs (features) by selecting only the ones needed for good generalization. Linear models with LASSO-type regularization achieve this by setting the weights of irrelevant features to zero, effectively identifying and ignoring them. In artificial neural networks, this selective focus can be achieved by pruning the input layer. Given a cost function enhanced with a sparsity-promoting penalty, our proposal selects a regularization term $\lambda$ (without the use of cross-validation or a validation set) that creates a local minimum in the cost function at the origin where no features are selected. This local minimum acts as a baseline, meaning that if there is no strong enough signal to justify a feature inclusion, the local minimum remains at zero with a high prescribed probability. The method is flexible, applying to complex models ranging from shallow to deep artificial neural networks and supporting various cost functions and sparsity-promoting penalties. We empirically show a remarkable phase transition in the probability of retrieving the relevant features, as well as good generalization thanks to the choice of $\lambda$, the non-convex penalty and the optimization scheme developed. This approach can be seen as a form of compressed sensing for complex models, allowing us to distill high-dimensional data into a compact, interpretable subset of meaningful features.
Related papers
- Beyond Softmax: A Natural Parameterization for Categorical Random Variables [61.709831225296305]
We introduce the $textitcatnat$ function, a function composed of a sequence of hierarchical binary splits.<n>A rich set of experiments show that the proposed function improves the learning efficiency and yields models characterized by consistently higher test performance.
arXiv Detail & Related papers (2025-09-29T12:55:50Z) - SAND: One-Shot Feature Selection with Additive Noise Distortion [3.5976830118932583]
We introduce a novel, non-intrusive feature selection layer that automatically identifies and selects the $k$ most informative features during neural network training.<n>Our method is uniquely simple, requiring no alterations to the loss function, network architecture, or post-selection retraining.<n>Our work demonstrates that simplicity and performance are not mutually exclusive, offering a powerful yet straightforward tool for feature selection in machine learning.
arXiv Detail & Related papers (2025-05-06T18:59:35Z) - Neural Parameter Estimation with Incomplete Data [0.0]
It is not straightforward to use neural networks with data that for various reasons are incomplete.
A recently proposed approach to remedy this issue inputs an appropriately padded data vector and a vector that encodes the missingness pattern to a neural network.
Here, we propose an alternative approach that is based on the Monte Carlo expectation-maximization (EM) algorithm.
arXiv Detail & Related papers (2025-01-08T08:05:17Z) - Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - MUSO: Achieving Exact Machine Unlearning in Over-Parameterized Regimes [19.664090734076712]
Machine unlearning (MU) makes a well-trained model behave as if it had never been trained on specific data.<n>We propose an alternating optimization algorithm that unifies the tasks of unlearning and relabeling.<n>The algorithm's effectiveness, confirmed through numerical experiments, highlights its superior performance in unlearning across various scenarios.
arXiv Detail & Related papers (2024-10-11T06:17:17Z) - Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - Approximation with Random Shallow ReLU Networks with Applications to Model Reference Adaptive Control [0.0]
We show that ReLU networks with randomly generated weights and biases achieve $L_infty$ error of $O(m-1/2)$ with high probability.
We show how the result can be used to get approximations of required accuracy in a model reference adaptive control application.
arXiv Detail & Related papers (2024-03-25T19:39:17Z) - A Pseudo-Semantic Loss for Autoregressive Models with Logical
Constraints [87.08677547257733]
Neuro-symbolic AI bridges the gap between purely symbolic and neural approaches to learning.
We show how to maximize the likelihood of a symbolic constraint w.r.t the neural network's output distribution.
We also evaluate our approach on Sudoku and shortest-path prediction cast as autoregressive generation.
arXiv Detail & Related papers (2023-12-06T20:58:07Z) - Sparse-Input Neural Network using Group Concave Regularization [10.103025766129006]
Simultaneous feature selection and non-linear function estimation are challenging in neural networks.
We propose a framework of sparse-input neural networks using group concave regularization for feature selection in both low-dimensional and high-dimensional settings.
arXiv Detail & Related papers (2023-07-01T13:47:09Z) - Supervised Feature Selection with Neuron Evolution in Sparse Neural
Networks [17.12834153477201]
We propose a novel resource-efficient supervised feature selection method using sparse neural networks.
By gradually pruning the uninformative features from the input layer of a sparse neural network trained from scratch, NeuroFS derives an informative subset of features efficiently.
NeuroFS achieves the highest ranking-based score among the considered state-of-the-art supervised feature selection models.
arXiv Detail & Related papers (2023-03-10T17:09:55Z) - The Contextual Lasso: Sparse Linear Models via Deep Neural Networks [5.607237982617641]
We develop a new statistical estimator that fits a sparse linear model to the explanatory features such that the sparsity pattern and coefficients vary as a function of the contextual features.
An extensive suite of experiments on real and synthetic data suggests that the learned models, which remain highly transparent, can be sparser than the regular lasso.
arXiv Detail & Related papers (2023-02-02T05:00:29Z) - Cross-Model Comparative Loss for Enhancing Neuronal Utility in Language
Understanding [82.46024259137823]
We propose a cross-model comparative loss for a broad range of tasks.
We demonstrate the universal effectiveness of comparative loss through extensive experiments on 14 datasets from 3 distinct NLU tasks.
arXiv Detail & Related papers (2023-01-10T03:04:27Z) - ORFit: One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares [5.430441358049335]
We investigate the problem of one-pass learning, in which a model is trained on sequentially arriving data without retraining on previous datapoints.<n>We propose Orthogonal Recursive Fitting (ORFit), an algorithm for one-pass learning which seeks to perfectly fit each new datapoint while minimally altering the predictions on previous datapoints.
arXiv Detail & Related papers (2022-07-28T02:01:31Z) - Neural Greedy Pursuit for Feature Selection [72.4121881681861]
We propose a greedy algorithm to select $N$ important features among $P$ input features for a non-linear prediction problem.
We use neural networks as predictors in the algorithm to compute the loss.
arXiv Detail & Related papers (2022-07-19T16:39:16Z) - Variable Bitrate Neural Fields [75.24672452527795]
We present a dictionary method for compressing feature grids, reducing their memory consumption by up to 100x.
We formulate the dictionary optimization as a vector-quantized auto-decoder problem which lets us learn end-to-end discrete neural representations in a space where no direct supervision is available.
arXiv Detail & Related papers (2022-06-15T17:58:34Z) - Implicit Parameter-free Online Learning with Truncated Linear Models [51.71216912089413]
parameter-free algorithms are online learning algorithms that do not require setting learning rates.
We propose new parameter-free algorithms that can take advantage of truncated linear models through a new update that has an "implicit" flavor.
Based on a novel decomposition of the regret, the new update is efficient, requires only one gradient at each step, never overshoots the minimum of the truncated model, and retains the favorable parameter-free properties.
arXiv Detail & Related papers (2022-03-19T13:39:49Z) - Accelerating Understanding of Scientific Experiments with End to End
Symbolic Regression [12.008215939224382]
We develop a deep neural network to address the problem of learning free-form symbolic expressions from raw data.
We train our neural network on a synthetic dataset consisting of data tables of varying length and varying levels of noise.
We validate our technique by running on a public dataset from behavioral science.
arXiv Detail & Related papers (2021-12-07T22:28:53Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption.
They can suffer from ill-posedness and convergence instability.
This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z) - Sample-Efficient Reinforcement Learning Is Feasible for Linearly
Realizable MDPs with Limited Revisiting [60.98700344526674]
Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning.
In this paper, we investigate a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states in a controlled and infrequent manner.
We develop an algorithm tailored to this setting, achieving a sample complexity that scales practicallyly with the feature dimension, the horizon, and the inverse sub-optimality gap, but not the size of the state/action space.
arXiv Detail & Related papers (2021-05-17T17:22:07Z) - Meta-Solver for Neural Ordinary Differential Equations [77.8918415523446]
We investigate how the variability in solvers' space can improve neural ODEs performance.
We show that the right choice of solver parameterization can significantly affect neural ODEs models in terms of robustness to adversarial attacks.
arXiv Detail & Related papers (2021-03-15T17:26:34Z) - Embedded methods for feature selection in neural networks [0.0]
Black box models like neural networks negatively affect the interpretability, generalizability, and the training time of these models.
I propose two integrated approaches for feature selection that can be incorporated directly into the parameter learning.
I benchmarked both the methods against Permutation Feature Importance (PFI) - a general-purpose feature ranking method and a random baseline.
arXiv Detail & Related papers (2020-10-12T16:33:46Z) - Category-Learning with Context-Augmented Autoencoder [63.05016513788047]
Finding an interpretable non-redundant representation of real-world data is one of the key problems in Machine Learning.
We propose a novel method of using data augmentations when training autoencoders.
We train a Variational Autoencoder in such a way, that it makes transformation outcome predictable by auxiliary network.
arXiv Detail & Related papers (2020-10-10T14:04:44Z) - Binary Stochastic Filtering: feature selection and beyond [0.0]
This work aims at extending the neural network with ability to automatically select features by rethinking how the sparsity regularization can be used.
The proposed method has demonstrated superior efficiency when compared to a few classical methods, achieved with minimal or no computational overhead.
arXiv Detail & Related papers (2020-07-08T06:57:10Z) - Towards Understanding Hierarchical Learning: Benefits of Neural
Representations [160.33479656108926]
In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks.
We show that neural representation can achieve improved sample complexities compared with the raw input.
Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
arXiv Detail & Related papers (2020-06-24T02:44:54Z) - What needles do sparse neural networks find in nonlinear haystacks [0.0]
A sparsity inducing penalty in artificial neural networks (ANNs) avoids over-fitting, especially in situations where noise is high and the training set is small.
For linear models, such an approach provably also recovers the important features with high probability in regimes for a well-chosen penalty parameter.
We perform a set of comprehensive Monte Carlo simulations on a simple model, and the numerical results show the effectiveness of the proposed approach.
arXiv Detail & Related papers (2020-06-07T04:46:55Z) - The data-driven physical-based equations discovery using evolutionary
approach [77.34726150561087]
We describe the algorithm for the mathematical equations discovery from the given observations data.
The algorithm combines genetic programming with the sparse regression.
It could be used for governing analytical equation discovery as well as for partial differential equations (PDE) discovery.
arXiv Detail & Related papers (2020-04-03T17:21:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.