Rethinking the Hyperparameters for Fine-tuning
- URL: http://arxiv.org/abs/2002.11770v1
- Date: Wed, 19 Feb 2020 18:59:52 GMT
- Title: Rethinking the Hyperparameters for Fine-tuning
- Authors: Hao Li, Pratik Chaudhari, Hao Yang, Michael Lam, Avinash Ravichandran,
Rahul Bhotika, Stefano Soatto
- Abstract summary: Fine-tuning from pre-trained ImageNet models has become the de-facto standard for various computer vision tasks.
Current practices for fine-tuning typically involve selecting an ad-hoc choice of hyper parameters.
This paper re-examines several common practices of setting hyper parameters for fine-tuning.
- Score: 78.15505286781293
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-tuning from pre-trained ImageNet models has become the de-facto standard
for various computer vision tasks. Current practices for fine-tuning typically
involve selecting an ad-hoc choice of hyperparameters and keeping them fixed to
values normally used for training from scratch. This paper re-examines several
common practices of setting hyperparameters for fine-tuning. Our findings are
based on extensive empirical evaluation for fine-tuning on various transfer
learning benchmarks. (1) While prior works have thoroughly investigated
learning rate and batch size, momentum for fine-tuning is a relatively
unexplored parameter. We find that the value of momentum also affects
fine-tuning performance and connect it with previous theoretical findings. (2)
Optimal hyperparameters for fine-tuning, in particular, the effective learning
rate, are not only dataset dependent but also sensitive to the similarity
between the source domain and target domain. This is in contrast to
hyperparameters for training from scratch. (3) Reference-based regularization
that keeps models close to the initial model does not necessarily apply for
"dissimilar" datasets. Our findings challenge common practices of fine-tuning
and encourages deep learning practitioners to rethink the hyperparameters for
fine-tuning.
Related papers
- A Unified Gaussian Process for Branching and Nested Hyperparameter
Optimization [19.351804144005744]
In deep learning, tuning parameters with conditional dependence are common in practice.
New GP model accounts for the dependent structure among input variables through a new kernel function.
High prediction accuracy and better optimization efficiency are observed in a series of synthetic simulations and real data applications of neural networks.
arXiv Detail & Related papers (2024-01-19T21:11:32Z) - A Framework for History-Aware Hyperparameter Optimisation in
Reinforcement Learning [8.659973888018781]
A Reinforcement Learning (RL) system depends on a set of initial conditions that affect the system's performance.
We propose a framework based on integrating complex event processing and temporal models, to alleviate these trade-offs.
We tested the proposed approach in a 5G mobile communications case study that uses DQN, a variant of RL, for its decision-making.
arXiv Detail & Related papers (2023-03-09T11:30:40Z) - Toward Theoretical Guidance for Two Common Questions in Practical
Cross-Validation based Hyperparameter Selection [72.76113104079678]
We show the first theoretical treatments of two common questions in cross-validation based hyperparameter selection.
We show that these generalizations can, respectively, always perform at least as well as always performing retraining or never performing retraining.
arXiv Detail & Related papers (2023-01-12T16:37:12Z) - Pre-training helps Bayesian optimization too [49.28382118032923]
We seek an alternative practice for setting functional priors.
In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori.
Our results show that our method is able to locate good hyper parameters at least 3 times more efficiently than the best competing methods.
arXiv Detail & Related papers (2022-07-07T04:42:54Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Hyperparameter-free Continuous Learning for Domain Classification in
Natural Language Understanding [60.226644697970116]
Domain classification is the fundamental task in natural language understanding (NLU)
Most existing continual learning approaches suffer from low accuracy and performance fluctuation.
We propose a hyper parameter-free continual learning model for text data that can stably produce high performance under various environments.
arXiv Detail & Related papers (2022-01-05T02:46:16Z) - DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language
Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive.
We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights.
Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z) - Guided Hyperparameter Tuning Through Visualization and Inference [12.035299005299306]
We present a streamlined visualization system enabling deep learning practitioners to more efficiently explore, tune, and optimize hyper parameters.
A key idea is to directly suggest more optimal hyper parameters using a predictive mechanism.
We evaluate the tool with a user study on deep learning model builders, finding that our participants have little issue adopting the tool and working with it as part of their workflow.
arXiv Detail & Related papers (2021-05-24T19:55:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.