Benchmarking Neural Network Training Algorithms
- URL: http://arxiv.org/abs/2306.07179v1
- Date: Mon, 12 Jun 2023 15:21:02 GMT
- Title: Benchmarking Neural Network Training Algorithms
- Authors: George E. Dahl, Frank Schneider, Zachary Nado, Naman Agarwal,
Chandramouli Shama Sastry, Philipp Hennig, Sourabh Medapati, Runa
Eschenhagen, Priya Kasimbeg, Daniel Suo, Juhan Bae, Justin Gilmer, Abel L.
Peirson, Bilal Khan, Rohan Anil, Mike Rabbat, Shankar Krishnan, Daniel
Snider, Ehsan Amid, Kongtao Chen, Chris J. Maddison, Rakshith Vasudev, Michal
Badura, Ankush Garg, Peter Mattson
- Abstract summary: Training algorithms are an essential part of every deep learning pipeline.
As a community, we are unable to reliably identify training algorithm improvements.
We introduce a new, competitive, time-to-result benchmark using multiple workloads running on fixed hardware.
- Score: 46.39165332979669
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training algorithms, broadly construed, are an essential part of every deep
learning pipeline. Training algorithm improvements that speed up training
across a wide variety of workloads (e.g., better update rules, tuning
protocols, learning rate schedules, or data selection schemes) could save time,
save computational resources, and lead to better, more accurate, models.
Unfortunately, as a community, we are currently unable to reliably identify
training algorithm improvements, or even determine the state-of-the-art
training algorithm. In this work, using concrete experiments, we argue that
real progress in speeding up training requires new benchmarks that resolve
three basic challenges faced by empirical comparisons of training algorithms:
(1) how to decide when training is complete and precisely measure training
time, (2) how to handle the sensitivity of measurements to exact workload
details, and (3) how to fairly compare algorithms that require hyperparameter
tuning. In order to address these challenges, we introduce a new, competitive,
time-to-result benchmark using multiple workloads running on fixed hardware,
the AlgoPerf: Training Algorithms benchmark. Our benchmark includes a set of
workload variants that make it possible to detect benchmark submissions that
are more robust to workload changes than current widely-used methods. Finally,
we evaluate baseline submissions constructed using various optimizers that
represent current practice, as well as other optimizers that have recently
received attention in the literature. These baseline results collectively
demonstrate the feasibility of our benchmark, show that non-trivial gaps
between methods exist, and set a provisional state-of-the-art for future
benchmark submissions to try and surpass.
Related papers
- No Train No Gain: Revisiting Efficient Training Algorithms For
Transformer-based Language Models [31.080446886440757]
In this work, we revisit three categories of such algorithms: dynamic architectures (layer stacking, dropping), batch selection (selective backprop, RHO loss), and efficient layers (Lion, Sophia)
We find that their training, validation, and downstream gains vanish compared to a baseline with a fully-decayed learning rate.
We define an evaluation protocol that enables machines to be done on arbitrary computation by mapping all computation time to a reference machine which we call reference system time.
arXiv Detail & Related papers (2023-07-12T20:10:14Z) - An Improved Reinforcement Learning Algorithm for Learning to Branch [12.27934038849211]
Branch-and-bound (B&B) is a general and widely used method for optimization.
In this paper, we propose a novel reinforcement learning-based B&B algorithm.
We evaluate the performance of the proposed algorithm over three public research benchmarks.
arXiv Detail & Related papers (2022-01-17T04:50:11Z) - Multi-Task Meta-Learning Modification with Stochastic Approximation [0.7734726150561089]
A few-shot learning problem is one of the main benchmarks of meta-learning algorithms.
In this paper we investigate the modification of standard meta-learning pipeline that takes a multi-task approach during training.
The proposed method simultaneously utilizes information from several meta-training tasks in a common loss function.
Proper optimization of these weights can have a big influence on training of the entire model and might improve the quality on test time tasks.
arXiv Detail & Related papers (2021-10-25T18:11:49Z) - Faster Meta Update Strategy for Noise-Robust Deep Learning [62.08964100618873]
We introduce a novel Faster Meta Update Strategy (FaMUS) to replace the most expensive step in the meta gradient with a faster layer-wise approximation.
We show our method is able to save two-thirds of the training time while still maintaining the comparable or achieving even better generalization performance.
arXiv Detail & Related papers (2021-04-30T16:19:07Z) - SIMPLE: SIngle-network with Mimicking and Point Learning for Bottom-up
Human Pose Estimation [81.03485688525133]
We propose a novel multi-person pose estimation framework, SIngle-network with Mimicking and Point Learning for Bottom-up Human Pose Estimation (SIMPLE)
Specifically, in the training process, we enable SIMPLE to mimic the pose knowledge from the high-performance top-down pipeline.
Besides, SIMPLE formulates human detection and pose estimation as a unified point learning framework to complement each other in single-network.
arXiv Detail & Related papers (2021-04-06T13:12:51Z) - Benchmarking Simulation-Based Inference [5.3898004059026325]
Recent advances in probabilistic modelling have led to a large number of simulation-based inference algorithms which do not require numerical evaluation of likelihoods.
We provide a benchmark with inference tasks and suitable performance metrics, with an initial selection of algorithms.
We found that the choice of performance metric is critical, that even state-of-the-art algorithms have substantial room for improvement, and that sequential estimation improves sample efficiency.
arXiv Detail & Related papers (2021-01-12T18:31:22Z) - Fast Class-wise Updating for Online Hashing [196.14748396106955]
This paper presents a novel supervised online hashing scheme, termed Fast Class-wise Updating for Online Hashing (FCOH)
A class-wise updating method is developed to decompose the binary code learning and alternatively renew the hash functions in a class-wise fashion, which well addresses the burden on large amounts of training batches.
To further achieve online efficiency, we propose a semi-relaxation optimization, which accelerates the online training by treating different binary constraints independently.
arXiv Detail & Related papers (2020-12-01T07:41:54Z) - How much progress have we made in neural network training? A New
Evaluation Protocol for Benchmarking Optimizers [86.36020260204302]
We propose a new benchmarking protocol to evaluate both end-to-end efficiency and data-addition training efficiency.
A human study is conducted to show that our evaluation protocol matches human tuning behavior better than the random search.
We then apply the proposed benchmarking framework to 7s and various tasks, including computer vision, natural language processing, reinforcement learning, and graph mining.
arXiv Detail & Related papers (2020-10-19T21:46:39Z) - Subset Sampling For Progressive Neural Network Learning [106.12874293597754]
Progressive Neural Network Learning is a class of algorithms that incrementally construct the network's topology and optimize its parameters based on the training data.
We propose to speed up this process by exploiting subsets of training data at each incremental training step.
Experimental results in object, scene and face recognition problems demonstrate that the proposed approach speeds up the optimization procedure considerably.
arXiv Detail & Related papers (2020-02-17T18:57:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.