United We Stand: Using Epoch-wise Agreement of Ensembles to Combat
Overfit
- URL: http://arxiv.org/abs/2310.11077v2
- Date: Thu, 28 Dec 2023 15:15:02 GMT
- Title: United We Stand: Using Epoch-wise Agreement of Ensembles to Combat
Overfit
- Authors: Uri Stern, Daniel Shwartz, Daphna Weinshall
- Abstract summary: We introduce a novel ensemble classifier for deep networks that effectively overcomes overfitting.
Our method allows for the incorporation of useful knowledge obtained during the overfitting phase without deterioration of the general performance.
Our method is easy to implement and can be integrated with any training scheme and architecture.
- Score: 7.627299398469962
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks have become the method of choice for solving many
classification tasks, largely because they can fit very complex functions
defined over raw data. The downside of such powerful learners is the danger of
overfit. In this paper, we introduce a novel ensemble classifier for deep
networks that effectively overcomes overfitting by combining models generated
at specific intermediate epochs during training. Our method allows for the
incorporation of useful knowledge obtained by the models during the overfitting
phase without deterioration of the general performance, which is usually missed
when early stopping is used. To motivate this approach, we begin with the
theoretical analysis of a regression model, whose prediction -- that the
variance among classifiers increases when overfit occurs -- is demonstrated
empirically in deep networks in common use. Guided by these results, we
construct a new ensemble-based prediction method, where the prediction is
determined by the class that attains the most consensual prediction throughout
the training epochs. Using multiple image and text classification datasets, we
show that when regular ensembles suffer from overfit, our method eliminates the
harmful reduction in generalization due to overfit, and often even surpasses
the performance obtained by early stopping. Our method is easy to implement and
can be integrated with any training scheme and architecture, without additional
prior knowledge beyond the training set. It is thus a practical and useful tool
to overcome overfit. Code is available at
https://github.com/uristern123/United-We-Stand-Using-Epoch-wise-Agreement-of-Ensembles-to-Combat-Ove rfit.
Related papers
- Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Relearning Forgotten Knowledge: on Forgetting, Overfit and Training-Free
Ensembles of DNNs [9.010643838773477]
We introduce a novel score for quantifying overfit, which monitors the forgetting rate of deep models on validation data.
We show that overfit can occur with and without a decrease in validation accuracy, and may be more common than previously appreciated.
We use our observations to construct a new ensemble method, based solely on the training history of a single network, which provides significant improvement without any additional cost in training time.
arXiv Detail & Related papers (2023-10-17T09:22:22Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Joint Training of Deep Ensembles Fails Due to Learner Collusion [61.557412796012535]
Ensembles of machine learning models have been well established as a powerful method of improving performance over a single model.
Traditionally, ensembling algorithms train their base learners independently or sequentially with the goal of optimizing their joint performance.
We show that directly minimizing the loss of the ensemble appears to rarely be applied in practice.
arXiv Detail & Related papers (2023-01-26T18:58:07Z) - Distributed Adversarial Training to Robustify Deep Neural Networks at
Scale [100.19539096465101]
Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification.
To defend against such attacks, an effective approach, known as adversarial training (AT), has been shown to mitigate robust training.
We propose a large-batch adversarial training framework implemented over multiple machines.
arXiv Detail & Related papers (2022-06-13T15:39:43Z) - An Empirical Investigation of the Role of Pre-training in Lifelong
Learning [21.995593026269578]
We show that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially.
We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima.
arXiv Detail & Related papers (2021-12-16T19:00:55Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Overfitting in adversarially robust deep learning [86.11788847990783]
We show that overfitting to the training set does in fact harm robust performance to a very large degree in adversarially robust training.
We also show that effects such as the double descent curve do still occur in adversarially trained models, yet fail to explain the observed overfitting.
arXiv Detail & Related papers (2020-02-26T15:40:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.