Related papers: Relearning Forgotten Knowledge: on Forgetting, Overfit and Training-Free Ensembles of DNNs

Relearning Forgotten Knowledge: on Forgetting, Overfit and Training-Free Ensembles of DNNs

URL: http://arxiv.org/abs/2310.11094v2
Date: Thu, 28 Dec 2023 14:41:37 GMT
Title: Relearning Forgotten Knowledge: on Forgetting, Overfit and Training-Free Ensembles of DNNs
Authors: Uri Stern, Daphna Weinshall
Abstract summary: We introduce a novel score for quantifying overfit, which monitors the forgetting rate of deep models on validation data. We show that overfit can occur with and without a decrease in validation accuracy, and may be more common than previously appreciated. We use our observations to construct a new ensemble method, based solely on the training history of a single network, which provides significant improvement without any additional cost in training time.
Score: 9.010643838773477
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The infrequent occurrence of overfit in deep neural networks is perplexing. On the one hand, theory predicts that as models get larger they should eventually become too specialized for a specific training set, with ensuing decrease in generalization. In contrast, empirical results in image classification indicate that increasing the training time of deep models or using bigger models almost never hurts generalization. Is it because the way we measure overfit is too limited? Here, we introduce a novel score for quantifying overfit, which monitors the forgetting rate of deep models on validation data. Presumably, this score indicates that even while generalization improves overall, there are certain regions of the data space where it deteriorates. When thus measured, we show that overfit can occur with and without a decrease in validation accuracy, and may be more common than previously appreciated. This observation may help to clarify the aforementioned confusing picture. We use our observations to construct a new ensemble method, based solely on the training history of a single network, which provides significant improvement in performance without any additional cost in training time. An extensive empirical evaluation with modern deep models shows our method's utility on multiple datasets, neural networks architectures and training schemes, both when training from scratch and when using pre-trained networks in transfer learning. Notably, our method outperforms comparable methods while being easier to implement and use, and further improves the performance of competitive networks on Imagenet by 1%.

Related papers

On Local Overfitting and Forgetting in Deep Neural Networks [6.7864586321550595]
We propose a novel score that captures the forgetting rate of deep models on validation data. We show that local overfitting occurs regardless of the presence of traditional overfitting. We devise a new ensemble method that aims to recover forgotten knowledge, relying solely on the training history of a single network.
arXiv Detail & Related papers (2024-12-17T14:53:38Z)
Training Better Deep Learning Models Using Human Saliency [11.295653130022156]
This work explores how human judgement about salient regions of an image can be introduced into deep convolutional neural network (DCNN) training. We propose a new component of the loss function that ConveYs Brain Oversight to Raise Generalization (CYBORG) and penalizes the model for using non-salient regions.
arXiv Detail & Related papers (2024-10-21T16:52:44Z)
A Dynamical Model of Neural Scaling Laws [79.59705237659547]
We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization. Our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.
arXiv Detail & Related papers (2024-02-02T01:41:38Z)
Enhancing Consistency and Mitigating Bias: A Data Replay Approach for Incremental Learning [93.90047628101155]
Deep learning systems are prone to catastrophic forgetting when learning from a sequence of tasks. To address this, some methods propose replaying data from previous tasks during new task learning. However, it is not expected in practice due to memory constraints and data privacy issues.
arXiv Detail & Related papers (2024-01-12T12:51:12Z)
More is Better in Modern Machine Learning: when Infinite Overparameterization is Optimal and Overfitting is Obligatory [12.689249854199982]
We show that the test risk of RF regression decreases monotonically with both the number of features and the number of samples. We then demonstrate that, for a large class of tasks characterized by powerlaw eigenstructure, training to near-zero training loss is obligatory.
arXiv Detail & Related papers (2023-11-24T18:27:41Z)
United We Stand: Using Epoch-wise Agreement of Ensembles to Combat Overfit [7.627299398469962]
We introduce a novel ensemble classifier for deep networks that effectively overcomes overfitting. Our method allows for the incorporation of useful knowledge obtained during the overfitting phase without deterioration of the general performance. Our method is easy to implement and can be integrated with any training scheme and architecture.
arXiv Detail & Related papers (2023-10-17T08:51:44Z)
LARA: A Light and Anti-overfitting Retraining Approach for Unsupervised Time Series Anomaly Detection [49.52429991848581]
We propose a Light and Anti-overfitting Retraining Approach (LARA) for deep variational auto-encoder based time series anomaly detection methods (VAEs) This work aims to make three novel contributions: 1) the retraining process is formulated as a convex problem and can converge at a fast rate as well as prevent overfitting; 2) designing a ruminate block, which leverages the historical data without the need to store them; and 3) mathematically proving that when fine-tuning the latent vector and reconstructed data, the linear formations can achieve the least adjusting errors between the ground truths and the fine-tuned ones.
arXiv Detail & Related papers (2023-10-09T12:36:16Z)
Learn, Unlearn and Relearn: An Online Learning Paradigm for Deep Neural Networks [12.525959293825318]
We introduce Learn, Unlearn, and Relearn (LURE) an online learning paradigm for deep neural networks (DNNs) LURE interchanges between the unlearning phase, which selectively forgets the undesirable information in the model, and the relearning phase, which emphasizes learning on generalizable features. We show that our training paradigm provides consistent performance gains across datasets in both classification and few-shot settings.
arXiv Detail & Related papers (2023-03-18T16:45:54Z)
EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones [80.662250618795]
This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers) As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models by >1.5x on ImageNet-1K/22K without sacrificing accuracy.
arXiv Detail & Related papers (2022-11-17T17:38:55Z)
Reasoning-Modulated Representations [85.08205744191078]
We study a common setting where our task is not purely opaque. Our approach paves the way for a new class of data-efficient representation learning.
arXiv Detail & Related papers (2021-07-19T13:57:13Z)
Learning from Failure: Training Debiased Classifier from Biased Classifier [76.52804102765931]
We show that neural networks learn to rely on spurious correlation only when it is "easier" to learn than the desired knowledge. We propose a failure-based debiasing scheme by training a pair of neural networks simultaneously. Our method significantly improves the training of the network against various types of biases in both synthetic and real-world datasets.
arXiv Detail & Related papers (2020-07-06T07:20:29Z)
Overfitting in adversarially robust deep learning [86.11788847990783]
We show that overfitting to the training set does in fact harm robust performance to a very large degree in adversarially robust training. We also show that effects such as the double descent curve do still occur in adversarially trained models, yet fail to explain the observed overfitting.
arXiv Detail & Related papers (2020-02-26T15:40:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.