Effective training-time stacking for ensembling of deep neural networks
- URL: http://arxiv.org/abs/2206.13491v1
- Date: Mon, 27 Jun 2022 17:52:53 GMT
- Title: Effective training-time stacking for ensembling of deep neural networks
- Authors: Polina Proscura and Alexey Zaytsev
- Abstract summary: A snapshot ensembling collects models in the ensemble along a single training path.
Our method improves snapshot ensembling by selecting and weighting ensemble members along the training path.
It relies on training-time likelihoods without looking at validation sample errors that standard stacking methods do.
- Score: 1.2667973028134798
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Ensembling is a popular and effective method for improving machine learning
(ML) models. It proves its value not only in classical ML but also for deep
learning. Ensembles enhance the quality and trustworthiness of ML solutions,
and allow uncertainty estimation. However, they come at a price: training
ensembles of deep learning models eat a huge amount of computational resources.
A snapshot ensembling collects models in the ensemble along a single training
path. As it runs training only one time, the computational time is similar to
the training of one model. However, the quality of models along the training
path is different: typically, later models are better if no overfitting occurs.
So, the models are of varying utility.
Our method improves snapshot ensembling by selecting and weighting ensemble
members along the training path. It relies on training-time likelihoods without
looking at validation sample errors that standard stacking methods do.
Experimental evidence for Fashion MNIST, CIFAR-10, and CIFAR-100 datasets
demonstrates the superior quality of the proposed weighted ensembles c.t.
vanilla ensembling of deep learning models.
Related papers
- Attribute-to-Delete: Machine Unlearning via Datamodel Matching [65.13151619119782]
Machine unlearning -- efficiently removing a small "forget set" training data on a pre-divertrained machine learning model -- has recently attracted interest.
Recent research shows that machine unlearning techniques do not hold up in such a challenging setting.
arXiv Detail & Related papers (2024-10-30T17:20:10Z) - Initializing Models with Larger Ones [76.41561758293055]
We introduce weight selection, a method for initializing smaller models by selecting a subset of weights from a pretrained larger model.
Our experiments demonstrate that weight selection can significantly enhance the performance of small models and reduce their training time.
arXiv Detail & Related papers (2023-11-30T18:58:26Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - A Simple Baseline that Questions the Use of Pretrained-Models in
Continual Learning [30.023047201419825]
Some methods design continual learning mechanisms on the pre-trained representations and only allow minimum updates or even no updates of the backbone models during the training of continual learning.
We argue that the pretrained feature extractor itself can be strong enough to achieve a competitive or even better continual learning performance on Split-CIFAR100 and CoRe 50 benchmarks.
This baseline achieved 88.53% on 10-Split-CIFAR-100, surpassing most state-of-the-art continual learning methods that are all using the same pretrained transformer model.
arXiv Detail & Related papers (2022-10-10T04:19:53Z) - Revealing Secrets From Pre-trained Models [2.0249686991196123]
Transfer-learning has been widely adopted in many emerging deep learning algorithms.
We show that pre-trained models and fine-tuned models have significantly high similarities in weight values.
We propose a new model extraction attack that reveals the model architecture and the pre-trained model used by the black-box victim model.
arXiv Detail & Related papers (2022-07-19T20:19:03Z) - MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided
Adaptation [68.30497162547768]
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.
We validate the efficiency and effectiveness of MoEBERT on natural language understanding and question answering tasks.
arXiv Detail & Related papers (2022-04-15T23:19:37Z) - Ensembling Off-the-shelf Models for GAN Training [55.34705213104182]
We find that pretrained computer vision models can significantly improve performance when used in an ensemble of discriminators.
We propose an effective selection mechanism, by probing the linear separability between real and fake samples in pretrained model embeddings.
Our method can improve GAN training in both limited data and large-scale settings.
arXiv Detail & Related papers (2021-12-16T18:59:50Z) - Jigsaw Clustering for Unsupervised Visual Representation Learning [68.09280490213399]
We propose a new jigsaw clustering pretext task in this paper.
Our method makes use of information from both intra- and inter-images.
It is even comparable to the contrastive learning methods when only half of training batches are used.
arXiv Detail & Related papers (2021-04-01T08:09:26Z) - Few-Shot Lifelong Learning [35.05196800623617]
Few-Shot Lifelong Learning enables deep learning models to perform lifelong/continual learning on few-shot data.
Our method selects very few parameters from the model for training every new set of classes instead of training the full model.
We experimentally show that our method significantly outperforms existing methods on the miniImageNet, CIFAR-100, and CUB-200 datasets.
arXiv Detail & Related papers (2021-03-01T13:26:57Z) - A Practical Incremental Method to Train Deep CTR Models [37.54660958085938]
We introduce a practical incremental method to train deep CTR models, which consists of three decoupled modules.
Our method can achieve comparable performance to the conventional batch mode training with much better training efficiency.
arXiv Detail & Related papers (2020-09-04T12:35:42Z) - Auto-Ensemble: An Adaptive Learning Rate Scheduling based Deep Learning
Model Ensembling [11.324407834445422]
This paper proposes Auto-Ensemble (AE) to collect checkpoints of deep learning model and ensemble them automatically.
The advantage of this method is to make the model converge to various local optima by scheduling the learning rate in once training.
arXiv Detail & Related papers (2020-03-25T08:17:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.