Related papers: How much progress have we made in neural network training? A New Evaluation Protocol for Benchmarking Optimizers

How much progress have we made in neural network training? A New Evaluation Protocol for Benchmarking Optimizers

URL: http://arxiv.org/abs/2010.09889v1
Date: Mon, 19 Oct 2020 21:46:39 GMT
Title: How much progress have we made in neural network training? A New Evaluation Protocol for Benchmarking Optimizers
Authors: Yuanhao Xiong, Xuanqing Liu, Li-Cheng Lan, Yang You, Si Si, Cho-Jui Hsieh
Abstract summary: We propose a new benchmarking protocol to evaluate both end-to-end efficiency and data-addition training efficiency. A human study is conducted to show that our evaluation protocol matches human tuning behavior better than the random search. We then apply the proposed benchmarking framework to 7s and various tasks, including computer vision, natural language processing, reinforcement learning, and graph mining.
Score: 86.36020260204302
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Many optimizers have been proposed for training deep neural networks, and they often have multiple hyperparameters, which make it tricky to benchmark their performance. In this work, we propose a new benchmarking protocol to evaluate both end-to-end efficiency (training a model from scratch without knowing the best hyperparameter) and data-addition training efficiency (the previously selected hyperparameters are used for periodically re-training the model with newly collected data). For end-to-end efficiency, unlike previous work that assumes random hyperparameter tuning, which over-emphasizes the tuning time, we propose to evaluate with a bandit hyperparameter tuning strategy. A human study is conducted to show that our evaluation protocol matches human tuning behavior better than the random search. For data-addition training, we propose a new protocol for assessing the hyperparameter sensitivity to data shift. We then apply the proposed benchmarking framework to 7 optimizers and various tasks, including computer vision, natural language processing, reinforcement learning, and graph mining. Our results show that there is no clear winner across all the tasks.

Related papers

Training neural networks faster with minimal tuning using pre-computed lists of hyperparameters for NAdamW [11.681640186200951]
We present a set of practical and performant hyper parameter lists for NAdamW. Our best NAdamW hyper parameter list performs well on AlgoPerf held-out workloads not used to construct it. It also outperforms basic learning rate/weight decay sweeps and an off-the-shelf Bayesian optimization tool when restricted to the same budget.
arXiv Detail & Related papers (2025-03-06T00:14:50Z)
Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis [51.14136878142034]
Point cloud analysis has achieved outstanding performance by transferring point cloud pre-trained models. Existing methods for model adaptation usually update all model parameters, which is inefficient as it relies on high computational costs. In this paper, we aim to study parameter-efficient transfer learning for point cloud analysis with an ideal trade-off between task performance and parameter efficiency.
arXiv Detail & Related papers (2024-03-03T08:25:04Z)
Target Variable Engineering [0.0]
We compare the predictive performance of regression models trained to predict numeric targets vs. classifiers trained to predict their binarized counterparts. We find that regression requires significantly more computational effort to converge upon the optimal performance.
arXiv Detail & Related papers (2023-10-13T23:12:21Z)
Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How [62.467716468917224]
We propose a methodology that jointly searches for the optimal pretrained model and the hyperparameters for finetuning it. Our method transfers knowledge about the performance of many pretrained models on a series of datasets. We empirically demonstrate that our resulting approach can quickly select an accurate pretrained model for a new dataset.
arXiv Detail & Related papers (2023-06-06T16:15:26Z)
Hyperparameter Optimization through Neural Network Partitioning [11.6941692990626]
We propose a simple and efficient way for optimizing hyper parameters in neural networks. Our method partitions the training data and a neural network model into $K$ data shards and parameter partitions. We demonstrate that we can apply this objective to optimize a variety of different hyper parameters in a single training run.
arXiv Detail & Related papers (2023-04-28T11:24:41Z)
Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters. We find that our approach successfully generates parameters for a wide range of loss prompts. We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z)
Simple and Effective Gradient-Based Tuning of Sequence-to-Sequence Models [8.370770440898454]
Huge cost of training larger language models can make tuning them prohibitively expensive. We apply gradient-based hyper- parameter optimization to sequence-to-sequence tasks for the first time. We show efficiency and performance gains over strong baselines for both Neural Machine Translation and Natural Language Understanding (NLU) tasks.
arXiv Detail & Related papers (2022-09-10T14:52:41Z)
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made. parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task. In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z)
Rethinking the Hyperparameters for Fine-tuning [78.15505286781293]
Fine-tuning from pre-trained ImageNet models has become the de-facto standard for various computer vision tasks. Current practices for fine-tuning typically involve selecting an ad-hoc choice of hyper parameters. This paper re-examines several common practices of setting hyper parameters for fine-tuning.
arXiv Detail & Related papers (2020-02-19T18:59:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.