OUI Need to Talk About Weight Decay: A New Perspective on Overfitting Detection
- URL: http://arxiv.org/abs/2504.17160v1
- Date: Thu, 24 Apr 2025 00:41:59 GMT
- Title: OUI Need to Talk About Weight Decay: A New Perspective on Overfitting Detection
- Authors: Alberto Fernández-Hernández, Jose I. Mestre, Manuel F. Dolz, Jose Duato, Enrique S. Quintana-Ortí,
- Abstract summary: Overfitting-Underfitting Indicator (OUI) is a novel tool for monitoring the training dynamics of Deep Neural Networks (DNNs)<n>OUI indicates whether a model is overfitting or underfitting during training without requiring validation data.
- Score: 0.5242869847419834
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We introduce the Overfitting-Underfitting Indicator (OUI), a novel tool for monitoring the training dynamics of Deep Neural Networks (DNNs) and identifying optimal regularization hyperparameters. Specifically, we validate that OUI can effectively guide the selection of the Weight Decay (WD) hyperparameter by indicating whether a model is overfitting or underfitting during training without requiring validation data. Through experiments on DenseNet-BC-100 with CIFAR- 100, EfficientNet-B0 with TinyImageNet and ResNet-34 with ImageNet-1K, we show that maintaining OUI within a prescribed interval correlates strongly with improved generalization and validation scores. Notably, OUI converges significantly faster than traditional metrics such as loss or accuracy, enabling practitioners to identify optimal WD (hyperparameter) values within the early stages of training. By leveraging OUI as a reliable indicator, we can determine early in training whether the chosen WD value leads the model to underfit the training data, overfit, or strike a well-balanced trade-off that maximizes validation scores. This enables more precise WD tuning for optimal performance on the tested datasets and DNNs. All code for reproducing these experiments is available at https://github.com/AlbertoFdezHdez/OUI.
Related papers
- Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - Neural Priming for Sample-Efficient Adaptation [92.14357804106787]
We propose Neural Priming, a technique for adapting large pretrained models to distribution shifts and downstream tasks.
Neural Priming can be performed at test time, even for pretraining as large as LAION-2B.
arXiv Detail & Related papers (2023-06-16T21:53:16Z) - Bridging Precision and Confidence: A Train-Time Loss for Calibrating
Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions.
Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z) - Self-Supervised Pre-Training for Transformer-Based Person
Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID)
Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance.
This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z) - IADA: Iterative Adversarial Data Augmentation Using Formal Verification
and Expert Guidance [1.599072005190786]
We propose an iterative adversarial data augmentation framework to learn neural network models.
The proposed framework is applied to an artificial 2D dataset, the MNIST dataset, and a human motion dataset.
We show that our training method can improve the robustness and accuracy of the learned model.
arXiv Detail & Related papers (2021-08-16T03:05:53Z) - Densely Deformable Efficient Salient Object Detection Network [24.469522151877847]
In this paper, inspired by the best background/foreground separation abilities of deformable convolutions, we employ them in our Densely Deformable Network (DDNet)
The salient regions from densely deformable convolutions are further refined using transposed convolutions to optimally generate the saliency maps.
Results indicate that the current models have limited generalization potentials, demanding further research in this direction.
arXiv Detail & Related papers (2021-02-12T09:17:38Z) - CPT: Efficient Deep Neural Network Training via Cyclic Precision [24.218905131408288]
We conjecture that DNNs' precision might have a similar effect as the learning rate during DNN training.
We propose Cyclic Precision Training (CPT) to vary the precision between two boundary values.
arXiv Detail & Related papers (2021-01-25T02:56:18Z) - Increasing Trustworthiness of Deep Neural Networks via Accuracy
Monitoring [20.456742449675904]
Inference accuracy of deep neural networks (DNNs) is a crucial performance metric, but can vary greatly in practice subject to actual test datasets.
This has raised significant concerns with trustworthiness of DNNs, especially in safety-critical applications.
We propose a neural network-based accuracy monitor model, which only takes the deployed DNN's softmax probability output as its input.
arXiv Detail & Related papers (2020-07-03T03:09:36Z) - Bayesian Graph Neural Networks with Adaptive Connection Sampling [62.51689735630133]
We propose a unified framework for adaptive connection sampling in graph neural networks (GNNs)
The proposed framework not only alleviates over-smoothing and over-fitting tendencies of deep GNNs, but also enables learning with uncertainty in graph analytic tasks with GNNs.
arXiv Detail & Related papers (2020-06-07T07:06:35Z) - Calibrating Deep Neural Networks using Focal Loss [77.92765139898906]
Miscalibration is a mismatch between a model's confidence and its correctness.
We show that focal loss allows us to learn models that are already very well calibrated.
We show that our approach achieves state-of-the-art calibration without compromising on accuracy in almost all cases.
arXiv Detail & Related papers (2020-02-21T17:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.