Data-driven Weight Initialization with Sylvester Solvers
- URL: http://arxiv.org/abs/2105.10335v1
- Date: Sun, 2 May 2021 07:33:16 GMT
- Title: Data-driven Weight Initialization with Sylvester Solvers
- Authors: Debasmit Das, Yash Bhalgat and Fatih Porikli
- Abstract summary: We propose a data-driven scheme to initialize the parameters of a deep neural network.
We show that our proposed method is especially effective in few-shot and fine-tuning settings.
- Score: 72.11163104763071
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we propose a data-driven scheme to initialize the parameters of
a deep neural network. This is in contrast to traditional approaches which
randomly initialize parameters by sampling from transformed standard
distributions. Such methods do not use the training data to produce a more
informed initialization. Our method uses a sequential layer-wise approach where
each layer is initialized using its input activations. The initialization is
cast as an optimization problem where we minimize a combination of encoding and
decoding losses of the input activations, which is further constrained by a
user-defined latent code. The optimization problem is then restructured into
the well-known Sylvester equation, which has fast and efficient gradient-free
solutions. Our data-driven method achieves a boost in performance compared to
random initialization methods, both before start of training and after training
is over. We show that our proposed method is especially effective in few-shot
and fine-tuning settings. We conclude this paper with analyses on time
complexity and the effect of different latent codes on the recognition
performance.
Related papers
- Sparser, Better, Deeper, Stronger: Improving Sparse Training with Exact Orthogonal Initialization [49.06421851486415]
Static sparse training aims to train sparse models from scratch, achieving remarkable results in recent years.
We propose Exact Orthogonal Initialization (EOI), a novel sparse Orthogonal Initialization scheme based on random Givens rotations.
Our method enables training highly sparse 1000-layer and CNN networks without residual connections or normalization techniques.
arXiv Detail & Related papers (2024-06-03T19:44:47Z) - Using linear initialisation to improve speed of convergence and
fully-trained error in Autoencoders [0.0]
We introduce a novel weight initialisation technique called the Straddled Matrix Initialiser.
Combination of Straddled Matrix and ReLU activation function initialises a Neural Network as a de facto linear model.
In all our experiments the Straddeled Matrix Initialiser clearly outperforms all other methods.
arXiv Detail & Related papers (2023-11-17T18:43:32Z) - Taking the human out of decomposition-based optimization via artificial
intelligence: Part II. Learning to initialize [0.0]
The proposed approach can lead to a significant reduction in solution time.
Active and supervised learning is used to learn a surrogate model that predicts the computational performance.
The results show that the proposed approach can lead to a significant reduction in solution time.
arXiv Detail & Related papers (2023-10-10T23:49:26Z) - Unsupervised Learning of Initialization in Deep Neural Networks via
Maximum Mean Discrepancy [74.34895342081407]
We propose an unsupervised algorithm to find good initialization for input data.
We first notice that each parameter configuration in the parameter space corresponds to one particular downstream task of d-way classification.
We then conjecture that the success of learning is directly related to how diverse downstream tasks are in the vicinity of the initial parameters.
arXiv Detail & Related papers (2023-02-08T23:23:28Z) - Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples.
We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment.
We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z) - Boosting Fast Adversarial Training with Learnable Adversarial
Initialization [79.90495058040537]
Adrial training (AT) has been demonstrated to be effective in improving model robustness by leveraging adversarial examples for training.
To boost training efficiency, fast gradient sign method (FGSM) is adopted in fast AT methods by calculating gradient only once.
arXiv Detail & Related papers (2021-10-11T05:37:00Z) - A novel initialisation based on hospital-resident assignment for the
k-modes algorithm [0.0]
This paper presents a new way of selecting an initial solution for the k-modes algorithm.
It allows for a notion of mathematical fairness and a leverage of the data that the common initialisations from literature do not.
arXiv Detail & Related papers (2020-02-07T10:20:49Z) - MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks.
The use of gradient combined nonvolutionity renders learning susceptible to novel problems.
We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.