Transfer Learning Between Different Architectures Via Weights Injection
- URL: http://arxiv.org/abs/2101.02757v1
- Date: Thu, 7 Jan 2021 20:42:35 GMT
- Title: Transfer Learning Between Different Architectures Via Weights Injection
- Authors: Maciej A. Czyzewski
- Abstract summary: This work presents a naive algorithm for parameter transfer between different architectures with a computationally cheap injection technique.
The primary objective is to speed up the training of neural networks from scratch.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work presents a naive algorithm for parameter transfer between different
architectures with a computationally cheap injection technique (which does not
require data). The primary objective is to speed up the training of neural
networks from scratch. It was found in this study that transferring knowledge
from any architecture was superior to Kaiming and Xavier for initialization. In
conclusion, the method presented is found to converge faster, which makes it a
drop-in replacement for classical methods. The method involves: 1) matching:
the layers of the pre-trained model with the targeted model; 2) injection: the
tensor is transformed into a desired shape. This work provides a comparison of
similarity between the current SOTA architectures (ImageNet), by utilising TLI
(Transfer Learning by Injection) score.
Related papers
- Neural Fine-Tuning Search for Few-Shot Learning [10.194808064624771]
In few-shot recognition, a classifier is required to rapidly adapt and generalize to a disjoint, novel set of classes.
Recent studies have shown the efficacy of fine-tuning with carefully crafted adaptation architectures.
We study this question through the lens of neural architecture search (NAS)
arXiv Detail & Related papers (2023-06-15T17:20:35Z) - Breaking the Architecture Barrier: A Method for Efficient Knowledge
Transfer Across Networks [0.0]
We present a method for transferring parameters between neural networks with different architectures.
Our method, called DPIAT, uses dynamic programming to match blocks and layers between architectures and transfer parameters efficiently.
In experiments on ImageNet, our method improved validation accuracy by an average of 1.6 times after 50 epochs of training.
arXiv Detail & Related papers (2022-12-28T17:35:41Z) - CoV-TI-Net: Transferred Initialization with Modified End Layer for
COVID-19 Diagnosis [5.546855806629448]
Transfer learning is a relatively new learning method that has been employed in many sectors to achieve good performance with fewer computations.
In this research, the PyTorch pre-trained models (VGG19_bn and WideResNet -101) are applied in the MNIST dataset.
The proposed model is developed and verified in the Kaggle notebook, and it reached the outstanding accuracy of 99.77% without taking a huge computational time.
arXiv Detail & Related papers (2022-09-20T08:52:52Z) - TCT: Convexifying Federated Learning using Bootstrapped Neural Tangent
Kernels [141.29156234353133]
State-of-the-art convex learning methods can perform far worse than their centralized counterparts when clients have dissimilar data distributions.
We show this disparity can largely be attributed to challenges presented by non-NISTity.
We propose a Train-Convexify neural network (TCT) procedure to sidestep this issue.
arXiv Detail & Related papers (2022-07-13T16:58:22Z) - Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples.
We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment.
We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z) - GradInit: Learning to Initialize Neural Networks for Stable and
Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks.
It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value.
It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z) - Train your classifier first: Cascade Neural Networks Training from upper
layers to lower layers [54.47911829539919]
We develop a novel top-down training method which can be viewed as an algorithm for searching for high-quality classifiers.
We tested this method on automatic speech recognition (ASR) tasks and language modelling tasks.
The proposed method consistently improves recurrent neural network ASR models on Wall Street Journal, self-attention ASR models on Switchboard, and AWD-LSTM language models on WikiText-2.
arXiv Detail & Related papers (2021-02-09T08:19:49Z) - Partial Is Better Than All: Revisiting Fine-tuning Strategy for Few-shot
Learning [76.98364915566292]
A common practice is to train a model on the base set first and then transfer to novel classes through fine-tuning.
We propose to transfer partial knowledge by freezing or fine-tuning particular layer(s) in the base model.
We conduct extensive experiments on CUB and mini-ImageNet to demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2021-02-08T03:27:05Z) - Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural
Architecture Search [60.965024145243596]
One-shot weight sharing methods have recently drawn great attention in neural architecture search due to high efficiency and competitive performance.
To alleviate this problem, we present a simple yet effective architecture distillation method.
We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training.
Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop.
arXiv Detail & Related papers (2020-10-29T17:55:05Z) - Generalized Zero and Few-Shot Transfer for Facial Forgery Detection [3.8073142980733]
We propose a new transfer learning approach to address the problem of zero and few-shot transfer in the context of forgery detection.
We find this learning strategy to be surprisingly effective at domain transfer compared to a traditional classification or even state-of-the-art domain adaptation/few-shot learning methods.
arXiv Detail & Related papers (2020-06-21T18:10:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.