Sparse tree-based initialization for neural networks
- URL: http://arxiv.org/abs/2209.15283v1
- Date: Fri, 30 Sep 2022 07:44:03 GMT
- Title: Sparse tree-based initialization for neural networks
- Authors: Patrick Lutz (BU), Ludovic Arnould (LPSM (UMR\_8001)), Claire Boyer
(LPSM (UMR\_8001)), Erwan Scornet (CMAP)
- Abstract summary: We show that dedicated neural network (NN) architectures can handle specific data types such as CNN for images or RNN for text.
In this work, we propose a new technique for (potentially deep) multilayer perceptrons (MLP)
We show that our new initializer operates an implicit regularization during the NN training, and emphasizes that the first layers act as a sparse feature extractor.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dedicated neural network (NN) architectures have been designed to handle
specific data types (such as CNN for images or RNN for text), which ranks them
among state-of-the-art methods for dealing with these data. Unfortunately, no
architecture has been found for dealing with tabular data yet, for which tree
ensemble methods (tree boosting, random forests) usually show the best
predictive performances. In this work, we propose a new sparse initialization
technique for (potentially deep) multilayer perceptrons (MLP): we first train a
tree-based procedure to detect feature interactions and use the resulting
information to initialize the network, which is subsequently trained via
standard stochastic gradient strategies. Numerical experiments on several
tabular data sets show that this new, simple and easy-to-use method is a solid
concurrent, both in terms of generalization capacity and computation time, to
default MLP initialization and even to existing complex deep learning
solutions. In fact, this wise MLP initialization raises the resulting NN
methods to the level of a valid competitor to gradient boosting when dealing
with tabular data. Besides, such initializations are able to preserve the
sparsity of weights introduced in the first layers of the network through
training. This fact suggests that this new initializer operates an implicit
regularization during the NN training, and emphasizes that the first layers act
as a sparse feature extractor (as for convolutional layers in CNN).
Related papers
- An Initialization Schema for Neuronal Networks on Tabular Data [0.9155684383461983]
We show that a binomial neural network can be used effectively on tabular data.
The proposed approach shows a simple but effective approach for initializing the first hidden layer in neural networks.
We evaluate our approach on multiple public datasets and showcase the improved performance compared to other neural network-based approaches.
arXiv Detail & Related papers (2023-11-07T13:52:35Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter.
We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures''
Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z) - Subquadratic Overparameterization for Shallow Neural Networks [60.721751363271146]
We provide an analytical framework that allows us to adopt standard neural training strategies.
We achieve the desiderata viaak-Lojasiewicz, smoothness, and standard assumptions.
arXiv Detail & Related papers (2021-11-02T20:24:01Z) - Tensor-based framework for training flexible neural networks [9.176056742068813]
We propose a new learning algorithm which solves a constrained coupled matrix-tensor factorization (CMTF) problem.
The proposed algorithm can handle different bases decomposition.
The goal of this method is to compress large pretrained NN models, by replacing tensorworks, em i.e., one or multiple layers of the original network, by a new flexible layer.
arXiv Detail & Related papers (2021-06-25T10:26:48Z) - Dense for the Price of Sparse: Improved Performance of Sparsely
Initialized Networks via a Subspace Offset [0.0]
We introduce a new DCT plus Sparse' layer architecture, which maintains information propagation and trainability even with as little as 0.01% trainable kernel parameters remaining.
Switching from standard sparse layers to DCT plus Sparse layers does not increase the storage footprint of a network and incurs only a small additional computational overhead.
arXiv Detail & Related papers (2021-02-12T00:05:02Z) - An Effective and Efficient Initialization Scheme for Training
Multi-layer Feedforward Neural Networks [5.161531917413708]
We propose a novel network initialization scheme based on the celebrated Stein's identity.
A proposed SteinGLM method is shown through extensive numerical results to be much faster and more accurate than other popular methods commonly used for training neural networks.
arXiv Detail & Related papers (2020-05-16T16:17:37Z) - Robust Pruning at Initialization [61.30574156442608]
A growing need for smaller, energy-efficient, neural networks to be able to use machine learning applications on devices with limited computational resources.
For Deep NNs, such procedures remain unsatisfactory as the resulting pruned networks can be difficult to train and, for instance, they do not prevent one layer from being fully pruned.
arXiv Detail & Related papers (2020-02-19T17:09:50Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.