An Initialization Schema for Neuronal Networks on Tabular Data
- URL: http://arxiv.org/abs/2311.03996v2
- Date: Fri, 24 Nov 2023 13:28:49 GMT
- Title: An Initialization Schema for Neuronal Networks on Tabular Data
- Authors: Wolfgang Fuhl
- Abstract summary: We show that a binomial neural network can be used effectively on tabular data.
The proposed approach shows a simple but effective approach for initializing the first hidden layer in neural networks.
We evaluate our approach on multiple public datasets and showcase the improved performance compared to other neural network-based approaches.
- Score: 0.9155684383461983
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Nowadays, many modern applications require heterogeneous tabular data, which
is still a challenging task in terms of regression and classification. Many
approaches have been proposed to adapt neural networks for this task, but
still, boosting and bagging of decision trees are the best-performing methods
for this task. In this paper, we show that a binomial initialized neural
network can be used effectively on tabular data. The proposed approach shows a
simple but effective approach for initializing the first hidden layer in neural
networks. We also show that this initializing schema can be used to jointly
train ensembles by adding gradient masking to batch entries and using the
binomial initialization for the last layer in a neural network. For this
purpose, we modified the hinge binary loss and the soft max loss to make them
applicable for joint ensemble training. We evaluate our approach on multiple
public datasets and showcase the improved performance compared to other neural
network-based approaches. In addition, we discuss the limitations and possible
further research of our approach for improving the applicability of neural
networks to tabular data.
Link:
https://es-cloud.cs.uni-tuebingen.de/d/8e2ab8c3fdd444e1a135/?p=%2FInitializationNeuronalNetworksTabularData&mode=list
Related papers
- Residual Random Neural Networks [0.0]
Single-layer feedforward neural network with random weights is a recurring motif in the neural networks literature.
We show that one can obtain good classification results even if the number of hidden neurons has the same order of magnitude as the dimensionality of the data samples.
arXiv Detail & Related papers (2024-10-25T22:00:11Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Benign Overfitting for Two-layer ReLU Convolutional Neural Networks [60.19739010031304]
We establish algorithm-dependent risk bounds for learning two-layer ReLU convolutional neural networks with label-flipping noise.
We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
arXiv Detail & Related papers (2023-03-07T18:59:38Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Improved Convergence Guarantees for Shallow Neural Networks [91.3755431537592]
We prove convergence of depth 2 neural networks, trained via gradient descent, to a global minimum.
Our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances, adversarial labels.
They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the NTK regime''
arXiv Detail & Related papers (2022-12-05T14:47:52Z) - Sparse tree-based initialization for neural networks [0.0]
We show that dedicated neural network (NN) architectures can handle specific data types such as CNN for images or RNN for text.
In this work, we propose a new technique for (potentially deep) multilayer perceptrons (MLP)
We show that our new initializer operates an implicit regularization during the NN training, and emphasizes that the first layers act as a sparse feature extractor.
arXiv Detail & Related papers (2022-09-30T07:44:03Z) - Training Graph Neural Networks by Graphon Estimation [2.5997274006052544]
We propose to train a graph neural network via resampling from a graphon estimate obtained from the underlying network data.
We show that our approach is competitive with and in many cases outperform the other over-smoothing reducing GNN training methods.
arXiv Detail & Related papers (2021-09-04T19:21:48Z) - Local Critic Training for Model-Parallel Learning of Deep Neural
Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training.
We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z) - An Effective and Efficient Initialization Scheme for Training
Multi-layer Feedforward Neural Networks [5.161531917413708]
We propose a novel network initialization scheme based on the celebrated Stein's identity.
A proposed SteinGLM method is shown through extensive numerical results to be much faster and more accurate than other popular methods commonly used for training neural networks.
arXiv Detail & Related papers (2020-05-16T16:17:37Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.