Related papers: Accelerating Neural Network Training: An Analysis of the AlgoPerf Competition

Accelerating Neural Network Training: An Analysis of the AlgoPerf Competition

URL: http://arxiv.org/abs/2502.15015v1
Date: Thu, 20 Feb 2025 20:11:54 GMT
Title: Accelerating Neural Network Training: An Analysis of the AlgoPerf Competition
Authors: Priya Kasimbeg, Frank Schneider, Runa Eschenhagen, Juhan Bae, Chandramouli Shama Sastry, Mark Saroufim, Boyuan Feng, Less Wright, Edward Z. Yang, Zachary Nado, Sourabh Medapati, Philipp Hennig, Michael Rabbat, George E. Dahl,
Abstract summary: The goal of the AlgoPerf: Training Algorithms competition is to evaluate practical speed-ups in neural network training achieved solely by improving the underlying training algorithms.<n>This paper presents the inaugural AlgoPerf competition's results, which drew 18 diverse submissions from 10 teams.
Score: 28.91552159033453
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The goal of the AlgoPerf: Training Algorithms competition is to evaluate practical speed-ups in neural network training achieved solely by improving the underlying training algorithms. In the external tuning ruleset, submissions must provide workload-agnostic hyperparameter search spaces, while in the self-tuning ruleset they must be completely hyperparameter-free. In both rulesets, submissions are compared on time-to-result across multiple deep learning workloads, training on fixed hardware. This paper presents the inaugural AlgoPerf competition's results, which drew 18 diverse submissions from 10 teams. Our investigation reveals several key findings: (1) The winning submission in the external tuning ruleset, using Distributed Shampoo, demonstrates the effectiveness of non-diagonal preconditioning over popular methods like Adam, even when compared on wall-clock runtime. (2) The winning submission in the self-tuning ruleset, based on the Schedule Free AdamW algorithm, demonstrates a new level of effectiveness for completely hyperparameter-free training algorithms. (3) The top-scoring submissions were surprisingly robust to workload changes. We also discuss the engineering challenges encountered in ensuring a fair comparison between different training algorithms. These results highlight both the significant progress so far, and the considerable room for further improvements.

Related papers

Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models [70.07661254213181]
We propose two principled algorithms for the test-time compute of large language models.<n>We prove theoretically that the failure probability of one algorithm decays to zero exponentially as its test-time compute grows.
arXiv Detail & Related papers (2024-11-29T05:29:47Z)
Energy-based learning algorithms for analog computing: a comparative study [2.0937431058291933]
Energy-based learning algorithms have recently gained a surge of interest due to their compatibility with analog hardware. We compare seven learning algorithms, namely contrastive learning (CL), equilibrium propagation (EP) and coupled learning (CpL) We find that negative perturbations are better than positive ones, and highlight the centered variant of EP as the best-performing algorithm.
arXiv Detail & Related papers (2023-12-22T22:49:58Z)
Benchmarking Neural Network Training Algorithms [46.39165332979669]
Training algorithms are an essential part of every deep learning pipeline. As a community, we are unable to reliably identify training algorithm improvements. We introduce a new, competitive, time-to-result benchmark using multiple workloads running on fixed hardware.
arXiv Detail & Related papers (2023-06-12T15:21:02Z)
GTFLAT: Game Theory Based Add-On For Empowering Federated Learning Aggregation Techniques [0.3867363075280543]
GTFLAT, as a game theory-based add-on, addresses an important research question. How can a federated learning algorithm achieve better performance and training efficiency by setting more effective adaptive weights for averaging in the model aggregation phase? The results reveal that, on average, using GTFLAT increases the top-1 test accuracy by 1.38%, while it needs 21.06% fewer communication rounds to reach the accuracy.
arXiv Detail & Related papers (2022-12-08T06:39:51Z)
A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive Coding Networks [65.34977803841007]
Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience. We show how by simply changing the temporal scheduling of the update rule for the synaptic weights leads to an algorithm that is much more efficient and stable than the original one.
arXiv Detail & Related papers (2022-11-16T00:11:04Z)
Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z)
Adversarial Deep Learning for Online Resource Allocation [12.118811903399951]
We use deep neural networks to learn an online algorithm for a resource allocation and pricing problem from scratch. Our work is the first using deep neural networks to design an online algorithm from the perspective of worst-case performance guarantee.
arXiv Detail & Related papers (2021-11-19T15:48:43Z)
Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z)
Event-Based Control for Online Training of Neural Networks [4.640828141705773]
We propose two Event-Based control loops to adjust the learning rate of a classical algorithm E (Exponential)/PD (Proportional Derivative)-Control. The first Event-Based control loop will be implemented to prevent sudden drop of the learning rate when the model is approaching the optimum. The second Event-Based control loop will decide, based on the learning speed, when to switch to the next data batch.
arXiv Detail & Related papers (2020-03-20T21:29:03Z)
Subset Sampling For Progressive Neural Network Learning [106.12874293597754]
Progressive Neural Network Learning is a class of algorithms that incrementally construct the network's topology and optimize its parameters based on the training data. We propose to speed up this process by exploiting subsets of training data at each incremental training step. Experimental results in object, scene and face recognition problems demonstrate that the proposed approach speeds up the optimization procedure considerably.
arXiv Detail & Related papers (2020-02-17T18:57:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.