Related papers: Data-driven Weight Initialization with Sylvester Solvers

Data-driven Weight Initialization with Sylvester Solvers

URL: http://arxiv.org/abs/2105.10335v1
Date: Sun, 2 May 2021 07:33:16 GMT
Title: Data-driven Weight Initialization with Sylvester Solvers
Authors: Debasmit Das, Yash Bhalgat and Fatih Porikli
Abstract summary: We propose a data-driven scheme to initialize the parameters of a deep neural network. We show that our proposed method is especially effective in few-shot and fine-tuning settings.
Score: 72.11163104763071
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this work, we propose a data-driven scheme to initialize the parameters of a deep neural network. This is in contrast to traditional approaches which randomly initialize parameters by sampling from transformed standard distributions. Such methods do not use the training data to produce a more informed initialization. Our method uses a sequential layer-wise approach where each layer is initialized using its input activations. The initialization is cast as an optimization problem where we minimize a combination of encoding and decoding losses of the input activations, which is further constrained by a user-defined latent code. The optimization problem is then restructured into the well-known Sylvester equation, which has fast and efficient gradient-free solutions. Our data-driven method achieves a boost in performance compared to random initialization methods, both before start of training and after training is over. We show that our proposed method is especially effective in few-shot and fine-tuning settings. We conclude this paper with analyses on time complexity and the effect of different latent codes on the recognition performance.

Related papers

Revisiting the Initial Steps in Adaptive Gradient Descent Optimization [6.468625143772815]
Adaptive gradient optimization methods, such as Adam, are prevalent in training deep neural networks across diverse machine learning tasks. These methods often suffer from suboptimal generalization compared to descent gradient (SGD) and exhibit instability. We introduce simple yet effective solutions: initializing the second-order moment estimation with non-zero values.
arXiv Detail & Related papers (2024-12-03T04:28:14Z)
Sparser, Better, Deeper, Stronger: Improving Sparse Training with Exact Orthogonal Initialization [49.06421851486415]
Static sparse training aims to train sparse models from scratch, achieving remarkable results in recent years. We propose Exact Orthogonal Initialization (EOI), a novel sparse Orthogonal Initialization scheme based on random Givens rotations. Our method enables training highly sparse 1000-layer and CNN networks without residual connections or normalization techniques.
arXiv Detail & Related papers (2024-06-03T19:44:47Z)
Using linear initialisation to improve speed of convergence and fully-trained error in Autoencoders [0.0]
We introduce a novel weight initialisation technique called the Straddled Matrix Initialiser. Combination of Straddled Matrix and ReLU activation function initialises a Neural Network as a de facto linear model. In all our experiments the Straddeled Matrix Initialiser clearly outperforms all other methods.
arXiv Detail & Related papers (2023-11-17T18:43:32Z)
Taking the human out of decomposition-based optimization via artificial intelligence: Part II. Learning to initialize [0.0]
The proposed approach can lead to a significant reduction in solution time. Active and supervised learning is used to learn a surrogate model that predicts the computational performance. The results show that the proposed approach can lead to a significant reduction in solution time.
arXiv Detail & Related papers (2023-10-10T23:49:26Z)
Unsupervised Learning of Initialization in Deep Neural Networks via Maximum Mean Discrepancy [74.34895342081407]
We propose an unsupervised algorithm to find good initialization for input data. We first notice that each parameter configuration in the parameter space corresponds to one particular downstream task of d-way classification. We then conjecture that the success of learning is directly related to how diverse downstream tasks are in the vicinity of the initial parameters.
arXiv Detail & Related papers (2023-02-08T23:23:28Z)
Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples. We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment. We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z)
Boosting Fast Adversarial Training with Learnable Adversarial Initialization [79.90495058040537]
Adrial training (AT) has been demonstrated to be effective in improving model robustness by leveraging adversarial examples for training. To boost training efficiency, fast gradient sign method (FGSM) is adopted in fast AT methods by calculating gradient only once.
arXiv Detail & Related papers (2021-10-11T05:37:00Z)
A novel initialisation based on hospital-resident assignment for the k-modes algorithm [0.0]
This paper presents a new way of selecting an initial solution for the k-modes algorithm. It allows for a notion of mathematical fairness and a leverage of the data that the common initialisations from literature do not.
arXiv Detail & Related papers (2020-02-07T10:20:49Z)
MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. The use of gradient combined nonvolutionity renders learning susceptible to novel problems. We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.