How much progress have we made in neural network training? A New
Evaluation Protocol for Benchmarking Optimizers
- URL: http://arxiv.org/abs/2010.09889v1
- Date: Mon, 19 Oct 2020 21:46:39 GMT
- Title: How much progress have we made in neural network training? A New
Evaluation Protocol for Benchmarking Optimizers
- Authors: Yuanhao Xiong, Xuanqing Liu, Li-Cheng Lan, Yang You, Si Si, Cho-Jui
Hsieh
- Abstract summary: We propose a new benchmarking protocol to evaluate both end-to-end efficiency and data-addition training efficiency.
A human study is conducted to show that our evaluation protocol matches human tuning behavior better than the random search.
We then apply the proposed benchmarking framework to 7s and various tasks, including computer vision, natural language processing, reinforcement learning, and graph mining.
- Score: 86.36020260204302
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many optimizers have been proposed for training deep neural networks, and
they often have multiple hyperparameters, which make it tricky to benchmark
their performance. In this work, we propose a new benchmarking protocol to
evaluate both end-to-end efficiency (training a model from scratch without
knowing the best hyperparameter) and data-addition training efficiency (the
previously selected hyperparameters are used for periodically re-training the
model with newly collected data). For end-to-end efficiency, unlike previous
work that assumes random hyperparameter tuning, which over-emphasizes the
tuning time, we propose to evaluate with a bandit hyperparameter tuning
strategy. A human study is conducted to show that our evaluation protocol
matches human tuning behavior better than the random search. For data-addition
training, we propose a new protocol for assessing the hyperparameter
sensitivity to data shift. We then apply the proposed benchmarking framework to
7 optimizers and various tasks, including computer vision, natural language
processing, reinforcement learning, and graph mining. Our results show that
there is no clear winner across all the tasks.
Related papers
- Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis [51.14136878142034]
Point cloud analysis has achieved outstanding performance by transferring point cloud pre-trained models.
Existing methods for model adaptation usually update all model parameters, which is inefficient as it relies on high computational costs.
In this paper, we aim to study parameter-efficient transfer learning for point cloud analysis with an ideal trade-off between task performance and parameter efficiency.
arXiv Detail & Related papers (2024-03-03T08:25:04Z) - Target Variable Engineering [0.0]
We compare the predictive performance of regression models trained to predict numeric targets vs. classifiers trained to predict their binarized counterparts.
We find that regression requires significantly more computational effort to converge upon the optimal performance.
arXiv Detail & Related papers (2023-10-13T23:12:21Z) - Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How [62.467716468917224]
We propose a methodology that jointly searches for the optimal pretrained model and the hyperparameters for finetuning it.
Our method transfers knowledge about the performance of many pretrained models on a series of datasets.
We empirically demonstrate that our resulting approach can quickly select an accurate pretrained model for a new dataset.
arXiv Detail & Related papers (2023-06-06T16:15:26Z) - Hyperparameter Optimization through Neural Network Partitioning [11.6941692990626]
We propose a simple and efficient way for optimizing hyper parameters in neural networks.
Our method partitions the training data and a neural network model into $K$ data shards and parameter partitions.
We demonstrate that we can apply this objective to optimize a variety of different hyper parameters in a single training run.
arXiv Detail & Related papers (2023-04-28T11:24:41Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - Simple and Effective Gradient-Based Tuning of Sequence-to-Sequence
Models [8.370770440898454]
Huge cost of training larger language models can make tuning them prohibitively expensive.
We apply gradient-based hyper- parameter optimization to sequence-to-sequence tasks for the first time.
We show efficiency and performance gains over strong baselines for both Neural Machine Translation and Natural Language Understanding (NLU) tasks.
arXiv Detail & Related papers (2022-09-10T14:52:41Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - Rethinking the Hyperparameters for Fine-tuning [78.15505286781293]
Fine-tuning from pre-trained ImageNet models has become the de-facto standard for various computer vision tasks.
Current practices for fine-tuning typically involve selecting an ad-hoc choice of hyper parameters.
This paper re-examines several common practices of setting hyper parameters for fine-tuning.
arXiv Detail & Related papers (2020-02-19T18:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.