Tune without Validation: Searching for Learning Rate and Weight Decay on
Training Sets
- URL: http://arxiv.org/abs/2403.05532v1
- Date: Fri, 8 Mar 2024 18:57:00 GMT
- Title: Tune without Validation: Searching for Learning Rate and Weight Decay on
Training Sets
- Authors: Lorenzo Brigato and Stavroula Mougiakakou
- Abstract summary: Tune without validation (Twin) is a pipeline for tuning learning rate and weight decay.
We run extensive experiments on 20 image classification datasets and train several families of deep networks.
We demonstrate proper HP selection when training from scratch and fine-tuning, emphasizing small-sample scenarios.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce Tune without Validation (Twin), a pipeline for tuning learning
rate and weight decay without validation sets. We leverage a recent theoretical
framework concerning learning phases in hypothesis space to devise a heuristic
that predicts what hyper-parameter (HP) combinations yield better
generalization. Twin performs a grid search of trials according to an
early-/non-early-stopping scheduler and then segments the region that provides
the best results in terms of training loss. Among these trials, the weight norm
strongly correlates with predicting generalization. To assess the effectiveness
of Twin, we run extensive experiments on 20 image classification datasets and
train several families of deep networks, including convolutional, transformer,
and feed-forward models. We demonstrate proper HP selection when training from
scratch and fine-tuning, emphasizing small-sample scenarios.
Related papers
- TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Flipped Classroom: Effective Teaching for Time Series Forecasting [0.0]
Sequence-to-sequence models based on LSTM and GRU are a most popular choice for forecasting time series data.
The two most common training strategies within this context are teacher forcing (TF) and free running (FR)
We propose several new curricula, and systematically evaluate their performance in two experimental sets.
arXiv Detail & Related papers (2022-10-17T11:53:25Z) - Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language
Models [107.05966685291067]
We propose test-time prompt tuning (TPT) to learn adaptive prompts on the fly with a single test sample.
TPT improves the zero-shot top-1 accuracy of CLIP by 3.6% on average.
In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data.
arXiv Detail & Related papers (2022-09-15T17:55:11Z) - Intersection of Parallels as an Early Stopping Criterion [64.8387564654474]
We propose a method to spot an early stopping point in the training iterations without the need for a validation set.
For a wide range of learning rates, our method, called Cosine-Distance Criterion (CDC), leads to better generalization on average than all the methods that we compare against.
arXiv Detail & Related papers (2022-08-19T19:42:41Z) - Sample-Efficient Optimisation with Probabilistic Transformer Surrogates [66.98962321504085]
This paper investigates the feasibility of employing state-of-the-art probabilistic transformers in Bayesian optimisation.
We observe two drawbacks stemming from their training procedure and loss definition, hindering their direct deployment as proxies in black-box optimisation.
We introduce two components: 1) a BO-tailored training prior supporting non-uniformly distributed points, and 2) a novel approximate posterior regulariser trading-off accuracy and input sensitivity to filter favourable stationary points for improved predictive performance.
arXiv Detail & Related papers (2022-05-27T11:13:17Z) - Delving into Sample Loss Curve to Embrace Noisy and Imbalanced Data [17.7825114228313]
Corrupted labels and class imbalance are commonly encountered in practically collected training data.
Existing approaches alleviate these issues by adopting a sample re-weighting strategy.
However, biased samples with corrupted labels and of tailed classes commonly co-exist in training data.
arXiv Detail & Related papers (2021-12-30T09:20:07Z) - Deep Ensembles for Low-Data Transfer Learning [21.578470914935938]
We study different ways of creating ensembles from pre-trained models.
We show that the nature of pre-training itself is a performant source of diversity.
We propose a practical algorithm that efficiently identifies a subset of pre-trained models for any downstream dataset.
arXiv Detail & Related papers (2020-10-14T07:59:00Z) - Rethinking the Hyperparameters for Fine-tuning [78.15505286781293]
Fine-tuning from pre-trained ImageNet models has become the de-facto standard for various computer vision tasks.
Current practices for fine-tuning typically involve selecting an ad-hoc choice of hyper parameters.
This paper re-examines several common practices of setting hyper parameters for fine-tuning.
arXiv Detail & Related papers (2020-02-19T18:59:52Z) - Fine-Tuning Pretrained Language Models: Weight Initializations, Data
Orders, and Early Stopping [62.78338049381917]
Fine-tuning pretrained contextual word embedding models to supervised downstream tasks has become commonplace in natural language processing.
We experiment with four datasets from the GLUE benchmark, fine-tuning BERT hundreds of times on each while varying only the random seeds.
We find substantial performance increases compared to previously reported results, and we quantify how the performance of the best-found model varies as a function of the number of fine-tuning trials.
arXiv Detail & Related papers (2020-02-15T02:40:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.