Related papers: Algorithmic progress in language models

Algorithmic progress in language models

URL: http://arxiv.org/abs/2403.05812v1
Date: Sat, 9 Mar 2024 06:26:21 GMT
Title: Algorithmic progress in language models
Authors: Anson Ho, Tamay Besiroglu, Ege Erdil, David Owen, Robi Rahman, Zifan Carl Guo, David Atkinson, Neil Thompson, Jaime Sevilla
Abstract summary: We investigate the rate at which algorithms for pre-training language models have improved since the advent of deep learning. We use a dataset of over 200 language model evaluations on Wikitext and Penn Treebank spanning 2012-2023. We find that the compute required to reach a set performance threshold has halved approximately every 8 months, with a 95% confidence interval of around 5 to 14 months.
Score: 1.7402659488193557
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We investigate the rate at which algorithms for pre-training language models have improved since the advent of deep learning. Using a dataset of over 200 language model evaluations on Wikitext and Penn Treebank spanning 2012-2023, we find that the compute required to reach a set performance threshold has halved approximately every 8 months, with a 95% confidence interval of around 5 to 14 months, substantially faster than hardware gains per Moore's Law. We estimate augmented scaling laws, which enable us to quantify algorithmic progress and determine the relative contributions of scaling models versus innovations in training algorithms. Despite the rapid pace of algorithmic progress and the development of new architectures such as the transformer, our analysis reveals that the increase in compute made an even larger contribution to overall performance improvements over this time period. Though limited by noisy benchmark data, our analysis quantifies the rapid progress in language modeling, shedding light on the relative contributions from compute and algorithms.

Related papers

atommovr: An open-source simulation framework for rearrangement in atomic arrays [5.442106161233214]
atom rearrangement is a fundamental building block for the development of neutral atom-based quantum processors.<n>We develop an open-source simulation framework for developing, comparing, and benchmarking algorithms.<n>We develop a naive dual-species algorithm able to prepare arbitrary targets with near-unity success rate.
arXiv Detail & Related papers (2025-08-04T17:59:47Z)
Recursive Inference Scaling: A Winning Path to Scalable Inference in Language and Multimodal Systems [21.01887711305712]
We introduce Recursive INference Scaling (RINS) as a complementary, plug-in recipe for scaling inference time. For a given fixed model architecture and training compute budget, RINS substantially improves language modeling performance. RINS delivers gains in multimodal systems, including a +2% improvement in 0-shot ImageNet accuracy for SigLIP-B/16.
arXiv Detail & Related papers (2025-02-11T12:11:40Z)
When, Where and Why to Average Weights? [36.106114687828395]
Averaging checkpoints along the training trajectory is a powerful approach to improve the generalization performance of Machine Learning models. We show that averaging significantly accelerates training and yields considerable efficiency gains, at the price of a minimal implementation and memory cost.
arXiv Detail & Related papers (2025-02-10T18:40:48Z)
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models [63.188607839223046]
This survey focuses on the benefits of scaling compute during inference. We explore three areas under a unified mathematical formalism: token-level generation algorithms, meta-generation algorithms, and efficient generation.
arXiv Detail & Related papers (2024-06-24T17:45:59Z)
Adaptive Sampling for Deep Learning via Efficient Nonparametric Proxies [35.29595714883275]
We develop an efficient sketch-based approximation to the Nadaraya-Watson estimator. Our sampling algorithm outperforms the baseline in terms of wall-clock time and accuracy on four datasets.
arXiv Detail & Related papers (2023-11-22T18:40:18Z)
Benchmarking Neural Network Training Algorithms [46.39165332979669]
Training algorithms are an essential part of every deep learning pipeline. As a community, we are unable to reliably identify training algorithm improvements. We introduce a new, competitive, time-to-result benchmark using multiple workloads running on fixed hardware.
arXiv Detail & Related papers (2023-06-12T15:21:02Z)
Towards Compute-Optimal Transfer Learning [82.88829463290041]
We argue that zero-shot structured pruning of pretrained models allows them to increase compute efficiency with minimal reduction in performance. Our results show that pruning convolutional filters of pretrained models can lead to more than 20% performance improvement in low computational regimes.
arXiv Detail & Related papers (2023-04-25T21:49:09Z)
Algorithmic progress in computer vision [0.8547032097715571]
We investigate algorithmic progress in image classification on ImageNet. We find that algorithmic improvements have been roughly as important as the scaling of compute for progress computer vision. compute-augmenting algorithmic advances are made at a pace more than twice as fast as the rate usually associated with Moore's law.
arXiv Detail & Related papers (2022-12-10T00:18:05Z)
Revisiting Neural Scaling Laws in Language and Vision [43.57394336742374]
We argue for a more rigorous methodology based on the extrapolation loss, instead of reporting the best-fitting parameters. We present a recipe for estimating scaling law parameters reliably from learning curves. We demonstrate that it extrapolates more accurately than previous methods in a wide range of architecture families across several domains.
arXiv Detail & Related papers (2022-09-13T09:41:51Z)
Scalable computation of prediction intervals for neural networks via matrix sketching [79.44177623781043]
Existing algorithms for uncertainty estimation require modifying the model architecture and training procedure. This work proposes a new algorithm that can be applied to a given trained neural network and produces approximate prediction intervals.
arXiv Detail & Related papers (2022-05-06T13:18:31Z)
Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training. We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z)
Efficient Computation of Expectations under Spanning Tree Distributions [67.71280539312536]
We propose unified algorithms for the important cases of first-order expectations and second-order expectations in edge-factored, non-projective spanning-tree models. Our algorithms exploit a fundamental connection between gradients and expectations, which allows us to derive efficient algorithms.
arXiv Detail & Related papers (2020-08-29T14:58:26Z)
Learning to Stop While Learning to Predict [85.7136203122784]
Many algorithm-inspired deep models are restricted to a fixed-depth'' for all inputs. Similar to algorithms, the optimal depth of a deep architecture may be different for different input instances. In this paper, we tackle this varying depth problem using a steerable architecture. We show that the learned deep model along with the stopping policy improves the performances on a diverse set of tasks.
arXiv Detail & Related papers (2020-06-09T07:22:01Z)
Measuring the Algorithmic Efficiency of Neural Networks [1.1108287264548806]
We show that the number of floating-point operations required to train a classifier to AlexNet-level performance has decreased by a factor of 44x between 2012 and 2019. This corresponds to algorithmic efficiency doubling every 16 months over a period of 7 years. We observe that hardware and algorithmic efficiency gains multiply and can be on a similar scale over meaningful horizons, which suggests that a good model of AI progress should integrate measures from both.
arXiv Detail & Related papers (2020-05-08T22:26:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.