Relearning Forgotten Knowledge: on Forgetting, Overfit and Training-Free
Ensembles of DNNs
- URL: http://arxiv.org/abs/2310.11094v2
- Date: Thu, 28 Dec 2023 14:41:37 GMT
- Title: Relearning Forgotten Knowledge: on Forgetting, Overfit and Training-Free
Ensembles of DNNs
- Authors: Uri Stern, Daphna Weinshall
- Abstract summary: We introduce a novel score for quantifying overfit, which monitors the forgetting rate of deep models on validation data.
We show that overfit can occur with and without a decrease in validation accuracy, and may be more common than previously appreciated.
We use our observations to construct a new ensemble method, based solely on the training history of a single network, which provides significant improvement without any additional cost in training time.
- Score: 9.010643838773477
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The infrequent occurrence of overfit in deep neural networks is perplexing.
On the one hand, theory predicts that as models get larger they should
eventually become too specialized for a specific training set, with ensuing
decrease in generalization. In contrast, empirical results in image
classification indicate that increasing the training time of deep models or
using bigger models almost never hurts generalization. Is it because the way we
measure overfit is too limited? Here, we introduce a novel score for
quantifying overfit, which monitors the forgetting rate of deep models on
validation data. Presumably, this score indicates that even while
generalization improves overall, there are certain regions of the data space
where it deteriorates. When thus measured, we show that overfit can occur with
and without a decrease in validation accuracy, and may be more common than
previously appreciated. This observation may help to clarify the aforementioned
confusing picture. We use our observations to construct a new ensemble method,
based solely on the training history of a single network, which provides
significant improvement in performance without any additional cost in training
time. An extensive empirical evaluation with modern deep models shows our
method's utility on multiple datasets, neural networks architectures and
training schemes, both when training from scratch and when using pre-trained
networks in transfer learning. Notably, our method outperforms comparable
methods while being easier to implement and use, and further improves the
performance of competitive networks on Imagenet by 1%.
Related papers
- Training Better Deep Learning Models Using Human Saliency [11.295653130022156]
This work explores how human judgement about salient regions of an image can be introduced into deep convolutional neural network (DCNN) training.
We propose a new component of the loss function that ConveYs Brain Oversight to Raise Generalization (CYBORG) and penalizes the model for using non-salient regions.
arXiv Detail & Related papers (2024-10-21T16:52:44Z) - A Dynamical Model of Neural Scaling Laws [79.59705237659547]
We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization.
Our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.
arXiv Detail & Related papers (2024-02-02T01:41:38Z) - More is Better in Modern Machine Learning: when Infinite Overparameterization is Optimal and Overfitting is Obligatory [12.689249854199982]
We show that the test risk of RF regression decreases monotonically with both the number of features and the number of samples.
We then demonstrate that, for a large class of tasks characterized by powerlaw eigenstructure, training to near-zero training loss is obligatory.
arXiv Detail & Related papers (2023-11-24T18:27:41Z) - United We Stand: Using Epoch-wise Agreement of Ensembles to Combat
Overfit [7.627299398469962]
We introduce a novel ensemble classifier for deep networks that effectively overcomes overfitting.
Our method allows for the incorporation of useful knowledge obtained during the overfitting phase without deterioration of the general performance.
Our method is easy to implement and can be integrated with any training scheme and architecture.
arXiv Detail & Related papers (2023-10-17T08:51:44Z) - LARA: A Light and Anti-overfitting Retraining Approach for Unsupervised
Time Series Anomaly Detection [49.52429991848581]
We propose a Light and Anti-overfitting Retraining Approach (LARA) for deep variational auto-encoder based time series anomaly detection methods (VAEs)
This work aims to make three novel contributions: 1) the retraining process is formulated as a convex problem and can converge at a fast rate as well as prevent overfitting; 2) designing a ruminate block, which leverages the historical data without the need to store them; and 3) mathematically proving that when fine-tuning the latent vector and reconstructed data, the linear formations can achieve the least adjusting errors between the ground truths and the fine-tuned ones.
arXiv Detail & Related papers (2023-10-09T12:36:16Z) - Learn, Unlearn and Relearn: An Online Learning Paradigm for Deep Neural
Networks [12.525959293825318]
We introduce Learn, Unlearn, and Relearn (LURE) an online learning paradigm for deep neural networks (DNNs)
LURE interchanges between the unlearning phase, which selectively forgets the undesirable information in the model, and the relearning phase, which emphasizes learning on generalizable features.
We show that our training paradigm provides consistent performance gains across datasets in both classification and few-shot settings.
arXiv Detail & Related papers (2023-03-18T16:45:54Z) - EfficientTrain: Exploring Generalized Curriculum Learning for Training
Visual Backbones [80.662250618795]
This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers)
As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models by >1.5x on ImageNet-1K/22K without sacrificing accuracy.
arXiv Detail & Related papers (2022-11-17T17:38:55Z) - Reasoning-Modulated Representations [85.08205744191078]
We study a common setting where our task is not purely opaque.
Our approach paves the way for a new class of data-efficient representation learning.
arXiv Detail & Related papers (2021-07-19T13:57:13Z) - Learning from Failure: Training Debiased Classifier from Biased
Classifier [76.52804102765931]
We show that neural networks learn to rely on spurious correlation only when it is "easier" to learn than the desired knowledge.
We propose a failure-based debiasing scheme by training a pair of neural networks simultaneously.
Our method significantly improves the training of the network against various types of biases in both synthetic and real-world datasets.
arXiv Detail & Related papers (2020-07-06T07:20:29Z) - Overfitting in adversarially robust deep learning [86.11788847990783]
We show that overfitting to the training set does in fact harm robust performance to a very large degree in adversarially robust training.
We also show that effects such as the double descent curve do still occur in adversarially trained models, yet fail to explain the observed overfitting.
arXiv Detail & Related papers (2020-02-26T15:40:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.