Related papers: Using linear initialisation to improve speed of convergence and fully-trained error in Autoencoders

Using linear initialisation to improve speed of convergence and fully-trained error in Autoencoders

URL: http://arxiv.org/abs/2311.10699v1
Date: Fri, 17 Nov 2023 18:43:32 GMT
Title: Using linear initialisation to improve speed of convergence and fully-trained error in Autoencoders
Authors: Marcel Marais, Mate Hartstein, George Cevora
Abstract summary: We introduce a novel weight initialisation technique called the Straddled Matrix Initialiser. Combination of Straddled Matrix and ReLU activation function initialises a Neural Network as a de facto linear model. In all our experiments the Straddeled Matrix Initialiser clearly outperforms all other methods.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Good weight initialisation is an important step in successful training of Artificial Neural Networks. Over time a number of improvements have been proposed to this process. In this paper we introduce a novel weight initialisation technique called the Straddled Matrix Initialiser. This initialisation technique is motivated by our assumption that major, global-scale relationships in data are linear with only smaller effects requiring complex non-linearities. Combination of Straddled Matrix and ReLU activation function initialises a Neural Network as a de facto linear model, which we postulate should be a better starting point for optimisation given our assumptions. We test this by training autoencoders on three datasets using Straddled Matrix and seven other state-of-the-art weight initialisation techniques. In all our experiments the Straddeled Matrix Initialiser clearly outperforms all other methods.

Related papers

IDInit: A Universal and Stable Initialization Method for Neural Network Training [44.542599968374205]
Methods that maintain identity transition within layers have shown good efficiency in network training. We introduce IDInit, a novel method that preserves identity in both the main and sub-stem layers of residual networks.
arXiv Detail & Related papers (2025-03-06T17:12:46Z)
Initialization Matters for Adversarial Transfer Learning [61.89451332757625]
We discover the necessity of an adversarially robust pretrained model. We propose Robust Linear Initialization (RoLI) for adversarial finetuning, which initializes the linear head with the weights obtained by adversarial linear probing. Across five different image classification datasets, we demonstrate the effectiveness of RoLI and achieve new state-of-the-art results.
arXiv Detail & Related papers (2023-12-10T00:51:05Z)
From Pointwise to Powerhouse: Initialising Neural Networks with Generative Models [1.1807848705528714]
In this paper, we introduce two groups of new initialisation methods. First, we locally initialise weight groups by employing variational autoencoders. Secondly, we globally initialise full weight sets by employing graph hypernetworks.
arXiv Detail & Related papers (2023-10-25T15:06:32Z)
Large-Scale OD Matrix Estimation with A Deep Learning Method [70.78575952309023]
The proposed method integrates deep learning and numerical optimization algorithms to infer matrix structure and guide numerical optimization. We conducted tests to demonstrate the good generalization performance of our method on a large-scale synthetic dataset.
arXiv Detail & Related papers (2023-10-09T14:30:06Z)
Subquadratic Overparameterization for Shallow Neural Networks [60.721751363271146]
We provide an analytical framework that allows us to adopt standard neural training strategies. We achieve the desiderata viaak-Lojasiewicz, smoothness, and standard assumptions.
arXiv Detail & Related papers (2021-11-02T20:24:01Z)
Implicit Greedy Rank Learning in Autoencoders via Overparameterized Linear Networks [7.412225511828064]
Deep linear networks trained with gradient descent yield low rank solutions. We show greedy learning of low-rank latent codes induced by a linear sub-network at the autoencoder bottleneck.
arXiv Detail & Related papers (2021-07-02T23:17:50Z)
Data-driven Weight Initialization with Sylvester Solvers [72.11163104763071]
We propose a data-driven scheme to initialize the parameters of a deep neural network. We show that our proposed method is especially effective in few-shot and fine-tuning settings.
arXiv Detail & Related papers (2021-05-02T07:33:16Z)
Improved Initialization of State-Space Artificial Neural Networks [0.0]
The identification of black-box nonlinear state-space models requires a flexible representation of the state and output equation. This paper introduces an improved approach for nonlinear state-space models represented as a recurrent artificial neural network.
arXiv Detail & Related papers (2021-03-26T15:16:08Z)
An Effective and Efficient Initialization Scheme for Training Multi-layer Feedforward Neural Networks [5.161531917413708]
We propose a novel network initialization scheme based on the celebrated Stein's identity. A proposed SteinGLM method is shown through extensive numerical results to be much faster and more accurate than other popular methods commonly used for training neural networks.
arXiv Detail & Related papers (2020-05-16T16:17:37Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. The use of gradient combined nonvolutionity renders learning susceptible to novel problems. We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.