Transfer learning for ensembles: reducing computation time and keeping
the diversity
- URL: http://arxiv.org/abs/2206.13116v1
- Date: Mon, 27 Jun 2022 08:47:42 GMT
- Title: Transfer learning for ensembles: reducing computation time and keeping
the diversity
- Authors: Ilya Shashkov and Nikita Balabin and Evgeny Burnaev and Alexey Zaytsev
- Abstract summary: Transferring a deep neural network trained on one problem to another requires only a small amount of data and little additional computation time.
A transfer of deep neural networks ensemble demands relatively high computational expenses.
Our approach for the transfer learning of ensembles consists of two steps: (a) shifting weights of encoders of all models in the ensemble by a single shift vector and (b) doing a tiny fine-tuning for each individual model afterwards.
- Score: 12.220069569688714
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Transferring a deep neural network trained on one problem to another requires
only a small amount of data and little additional computation time. The same
behaviour holds for ensembles of deep learning models typically superior to a
single model. However, a transfer of deep neural networks ensemble demands
relatively high computational expenses. The probability of overfitting also
increases.
Our approach for the transfer learning of ensembles consists of two steps:
(a) shifting weights of encoders of all models in the ensemble by a single
shift vector and (b) doing a tiny fine-tuning for each individual model
afterwards. This strategy leads to a speed-up of the training process and gives
an opportunity to add models to an ensemble with significantly reduced training
time using the shift vector.
We compare different strategies by computation time, the accuracy of an
ensemble, uncertainty estimation and disagreement and conclude that our
approach gives competitive results using the same computation complexity in
comparison with the traditional approach. Also, our method keeps the ensemble's
models' diversity higher.
Related papers
- Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network)
After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference.
We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z) - Transfer Learning with Reconstruction Loss [12.906500431427716]
This paper proposes a novel approach for model training by adding into the model an additional reconstruction stage associated with a new reconstruction loss.
The proposed approach encourages the learned features to be general and transferable, and therefore can be readily used for efficient transfer learning.
For numerical simulations, three applications are studied: transfer learning on classifying MNIST handwritten digits, the device-to-device wireless network power allocation, and the multiple-input-single-output network downlink beamforming and localization.
arXiv Detail & Related papers (2024-03-31T00:22:36Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Autoselection of the Ensemble of Convolutional Neural Networks with
Second-Order Cone Programming [0.8029049649310213]
This study proposes a mathematical model which prunes the ensemble of Convolutional Neural Networks (CNN)
The proposed model is tested on CIFAR-10, CIFAR-100 and MNIST data sets.
arXiv Detail & Related papers (2023-02-12T16:18:06Z) - Deep Negative Correlation Classification [82.45045814842595]
Existing deep ensemble methods naively train many different models and then aggregate their predictions.
We propose deep negative correlation classification (DNCC)
DNCC yields a deep classification ensemble where the individual estimator is both accurate and negatively correlated.
arXiv Detail & Related papers (2022-12-14T07:35:20Z) - Transfer Learning via Test-Time Neural Networks Aggregation [11.42582922543676]
It has been demonstrated that deep neural networks outperform traditional machine learning.
Deep networks lack generalisability, that is, they will not perform as good as in a new (testing) set drawn from a different distribution.
arXiv Detail & Related papers (2022-06-27T15:46:05Z) - Characterizing and overcoming the greedy nature of learning in
multi-modal deep neural networks [62.48782506095565]
We show that due to the greedy nature of learning in deep neural networks, models tend to rely on just one modality while under-fitting the other modalities.
We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning.
arXiv Detail & Related papers (2022-02-10T20:11:21Z) - Merging Models with Fisher-Weighted Averaging [24.698591753644077]
We introduce a fundamentally different method for transferring knowledge across models that amounts to "merging" multiple models into one.
Our approach effectively involves computing a weighted average of the models' parameters.
We show that our merging procedure makes it possible to combine models in previously unexplored ways.
arXiv Detail & Related papers (2021-11-18T17:59:35Z) - Neural Complexity Measures [96.06344259626127]
We propose Neural Complexity (NC), a meta-learning framework for predicting generalization.
Our model learns a scalar complexity measure through interactions with many heterogeneous tasks in a data-driven way.
arXiv Detail & Related papers (2020-08-07T02:12:10Z) - Continual Learning using a Bayesian Nonparametric Dictionary of Weight
Factors [75.58555462743585]
Naively trained neural networks tend to experience catastrophic forgetting in sequential task settings.
We propose a principled nonparametric approach based on the Indian Buffet Process (IBP) prior, letting the data determine how much to expand the model complexity.
We demonstrate the effectiveness of our method on a number of continual learning benchmarks and analyze how weight factors are allocated and reused throughout the training.
arXiv Detail & Related papers (2020-04-21T15:20:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.