Breaking the Architecture Barrier: A Method for Efficient Knowledge
Transfer Across Networks
- URL: http://arxiv.org/abs/2212.13970v1
- Date: Wed, 28 Dec 2022 17:35:41 GMT
- Title: Breaking the Architecture Barrier: A Method for Efficient Knowledge
Transfer Across Networks
- Authors: Maciej A. Czyzewski, Daniel Nowak, Kamil Piechowiak
- Abstract summary: We present a method for transferring parameters between neural networks with different architectures.
Our method, called DPIAT, uses dynamic programming to match blocks and layers between architectures and transfer parameters efficiently.
In experiments on ImageNet, our method improved validation accuracy by an average of 1.6 times after 50 epochs of training.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transfer learning is a popular technique for improving the performance of
neural networks. However, existing methods are limited to transferring
parameters between networks with same architectures. We present a method for
transferring parameters between neural networks with different architectures.
Our method, called DPIAT, uses dynamic programming to match blocks and layers
between architectures and transfer parameters efficiently. Compared to existing
parameter prediction and random initialization methods, it significantly
improves training efficiency and validation accuracy. In experiments on
ImageNet, our method improved validation accuracy by an average of 1.6 times
after 50 epochs of training. DPIAT allows both researchers and neural
architecture search systems to modify trained networks and reuse knowledge,
avoiding the need for retraining from scratch. We also introduce a network
architecture similarity measure, enabling users to choose the best source
network without any training.
Related papers
- Learning Morphisms with Gauss-Newton Approximation for Growing Networks [43.998746572276076]
A popular method for Neural Architecture Search (NAS) is based on growing networks via small local changes to the network's architecture called network morphisms.
Here we propose a NAS method for growing a network by using a Gauss-Newton approximation of the loss function to efficiently learn and evaluate candidate network morphisms.
arXiv Detail & Related papers (2024-11-07T01:12:42Z) - Principled Architecture-aware Scaling of Hyperparameters [69.98414153320894]
Training a high-quality deep neural network requires choosing suitable hyperparameters, which is a non-trivial and expensive process.
In this work, we precisely characterize the dependence of initializations and maximal learning rates on the network architecture.
We demonstrate that network rankings can be easily changed by better training networks in benchmarks.
arXiv Detail & Related papers (2024-02-27T11:52:49Z) - Speeding Up EfficientNet: Selecting Update Blocks of Convolutional
Neural Networks using Genetic Algorithm in Transfer Learning [0.0]
We devise a genetic algorithm to select blocks of layers for updating the parameters.
We show that our algorithm yields similar or better results than the baseline in terms of accuracy.
We also devise a metric called block importance to measure efficacy of each block as update block.
arXiv Detail & Related papers (2023-03-01T06:35:29Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - An Experimental Study of the Impact of Pre-training on the Pruning of a
Convolutional Neural Network [0.0]
In recent years, deep neural networks have known a wide success in various application domains.
Deep neural networks usually involve a large number of parameters, which correspond to the weights of the network.
The pruning methods notably attempt to reduce the size of the parameter set, by identifying and removing the irrelevant weights.
arXiv Detail & Related papers (2021-12-15T16:02:15Z) - SIRe-Networks: Skip Connections over Interlaced Multi-Task Learning and
Residual Connections for Structure Preserving Object Classification [28.02302915971059]
In this paper, we introduce an interlaced multi-task learning strategy, defined SIRe, to reduce the vanishing gradient in relation to the object classification task.
The presented methodology directly improves a convolutional neural network (CNN) by enforcing the input image structure preservation through auto-encoders.
To validate the presented methodology, a simple CNN and various implementations of famous networks are extended via the SIRe strategy and extensively tested on the CIFAR100 dataset.
arXiv Detail & Related papers (2021-10-06T13:54:49Z) - Joint Learning of Neural Transfer and Architecture Adaptation for Image
Recognition [77.95361323613147]
Current state-of-the-art visual recognition systems rely on pretraining a neural network on a large-scale dataset and finetuning the network weights on a smaller dataset.
In this work, we prove that dynamically adapting network architectures tailored for each domain task along with weight finetuning benefits in both efficiency and effectiveness.
Our method can be easily generalized to an unsupervised paradigm by replacing supernet training with self-supervised learning in the source domain tasks and performing linear evaluation in the downstream tasks.
arXiv Detail & Related papers (2021-03-31T08:15:17Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - Local Critic Training for Model-Parallel Learning of Deep Neural
Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training.
We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z) - Multi-fidelity Neural Architecture Search with Knowledge Distillation [69.09782590880367]
We propose a bayesian multi-fidelity method for neural architecture search: MF-KD.
Knowledge distillation adds to a loss function a term forcing a network to mimic some teacher network.
We show that training for a few epochs with such a modified loss function leads to a better selection of neural architectures than training for a few epochs with a logistic loss.
arXiv Detail & Related papers (2020-06-15T12:32:38Z) - A Hybrid Method for Training Convolutional Neural Networks [3.172761915061083]
We propose a hybrid method that uses both backpropagation and evolutionary strategies to train Convolutional Neural Networks.
We show that the proposed hybrid method is capable of improving upon regular training in the task of image classification.
arXiv Detail & Related papers (2020-04-15T17:52:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.