On Local Overfitting and Forgetting in Deep Neural Networks
- URL: http://arxiv.org/abs/2412.12968v2
- Date: Tue, 07 Jan 2025 14:45:04 GMT
- Title: On Local Overfitting and Forgetting in Deep Neural Networks
- Authors: Uri Stern, Tomer Yaacoby, Daphna Weinshall,
- Abstract summary: We propose a novel score that captures the forgetting rate of deep models on validation data.
We show that local overfitting occurs regardless of the presence of traditional overfitting.
We devise a new ensemble method that aims to recover forgotten knowledge, relying solely on the training history of a single network.
- Score: 6.7864586321550595
- License:
- Abstract: The infrequent occurrence of overfitting in deep neural networks is perplexing: contrary to theoretical expectations, increasing model size often enhances performance in practice. But what if overfitting does occur, though restricted to specific sub-regions of the data space? In this work, we propose a novel score that captures the forgetting rate of deep models on validation data. We posit that this score quantifies local overfitting: a decline in performance confined to certain regions of the data space. We then show empirically that local overfitting occurs regardless of the presence of traditional overfitting. Using the framework of deep over-parametrized linear models, we offer a certain theoretical characterization of forgotten knowledge, and show that it correlates with knowledge forgotten by real deep models. Finally, we devise a new ensemble method that aims to recover forgotten knowledge, relying solely on the training history of a single network. When combined with self-distillation, this method enhances the performance of any trained model without adding inference costs. Extensive empirical evaluations demonstrate the efficacy of our method across multiple datasets, contemporary neural network architectures, and training protocols.
Related papers
- Relearning Forgotten Knowledge: on Forgetting, Overfit and Training-Free
Ensembles of DNNs [9.010643838773477]
We introduce a novel score for quantifying overfit, which monitors the forgetting rate of deep models on validation data.
We show that overfit can occur with and without a decrease in validation accuracy, and may be more common than previously appreciated.
We use our observations to construct a new ensemble method, based solely on the training history of a single network, which provides significant improvement without any additional cost in training time.
arXiv Detail & Related papers (2023-10-17T09:22:22Z) - Learning to Jump: Thinning and Thickening Latent Counts for Generative
Modeling [69.60713300418467]
Learning to jump is a general recipe for generative modeling of various types of data.
We demonstrate when learning to jump is expected to perform comparably to learning to denoise, and when it is expected to perform better.
arXiv Detail & Related papers (2023-05-28T05:38:28Z) - Example Forgetting: A Novel Approach to Explain and Interpret Deep
Neural Networks in Seismic Interpretation [12.653673008542155]
deep neural networks are an attractive component for the common interpretation pipeline.
Deep neural networks are frequently met with distrust due to their property of producing semantically incorrect outputs when exposed to sections the model was not trained on.
We introduce a method that effectively relates semantically malfunctioned predictions to their respectful positions within the neural network representation manifold.
arXiv Detail & Related papers (2023-02-24T19:19:22Z) - Explaining Deep Models through Forgettable Learning Dynamics [12.653673008542155]
We visualize the learning behaviour during training by tracking how often samples are learned and forgotten in subsequent training epochs.
Inspired by this phenomenon, we present a novel segmentation method that actively uses this information to alter the data representation within the model.
arXiv Detail & Related papers (2023-01-10T21:59:20Z) - CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance.
In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z) - DeepBayes -- an estimator for parameter estimation in stochastic
nonlinear dynamical models [11.917949887615567]
We propose DeepBayes estimators that leverage the power of deep recurrent neural networks in learning an estimator.
The deep recurrent neural network architectures can be trained offline and ensure significant time savings during inference.
We demonstrate the applicability of our proposed method on different example models and perform detailed comparisons with state-of-the-art approaches.
arXiv Detail & Related papers (2022-05-04T18:12:17Z) - With Greater Distance Comes Worse Performance: On the Perspective of
Layer Utilization and Model Generalization [3.6321778403619285]
Generalization of deep neural networks remains one of the main open problems in machine learning.
Early layers generally learn representations relevant to performance on both training data and testing data.
Deeper layers only minimize training risks and fail to generalize well with testing or mislabeled data.
arXiv Detail & Related papers (2022-01-28T05:26:32Z) - Anomaly Detection on Attributed Networks via Contrastive Self-Supervised
Learning [50.24174211654775]
We present a novel contrastive self-supervised learning framework for anomaly detection on attributed networks.
Our framework fully exploits the local information from network data by sampling a novel type of contrastive instance pair.
A graph neural network-based contrastive learning model is proposed to learn informative embedding from high-dimensional attributes and local structure.
arXiv Detail & Related papers (2021-02-27T03:17:20Z) - Local Critic Training for Model-Parallel Learning of Deep Neural
Networks [94.69202357137452]
We propose a novel model-parallel learning method, called local critic training.
We show that the proposed approach successfully decouples the update process of the layer groups for both convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
We also show that trained networks by the proposed method can be used for structural optimization.
arXiv Detail & Related papers (2021-02-03T09:30:45Z) - Automatic Recall Machines: Internal Replay, Continual Learning and the
Brain [104.38824285741248]
Replay in neural networks involves training on sequential data with memorized samples, which counteracts forgetting of previous behavior caused by non-stationarity.
We present a method where these auxiliary samples are generated on the fly, given only the model that is being trained for the assessed objective.
Instead the implicit memory of learned samples within the assessed model itself is exploited.
arXiv Detail & Related papers (2020-06-22T15:07:06Z) - The large learning rate phase of deep learning: the catapult mechanism [50.23041928811575]
We present a class of neural networks with solvable training dynamics.
We find good agreement between our model's predictions and training dynamics in realistic deep learning settings.
We believe our results shed light on characteristics of models trained at different learning rates.
arXiv Detail & Related papers (2020-03-04T17:52:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.