Measuring the Algorithmic Efficiency of Neural Networks
- URL: http://arxiv.org/abs/2005.04305v1
- Date: Fri, 8 May 2020 22:26:37 GMT
- Title: Measuring the Algorithmic Efficiency of Neural Networks
- Authors: Danny Hernandez, Tom B. Brown
- Abstract summary: We show that the number of floating-point operations required to train a classifier to AlexNet-level performance has decreased by a factor of 44x between 2012 and 2019.
This corresponds to algorithmic efficiency doubling every 16 months over a period of 7 years.
We observe that hardware and algorithmic efficiency gains multiply and can be on a similar scale over meaningful horizons, which suggests that a good model of AI progress should integrate measures from both.
- Score: 1.1108287264548806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Three factors drive the advance of AI: algorithmic innovation, data, and the
amount of compute available for training. Algorithmic progress has
traditionally been more difficult to quantify than compute and data. In this
work, we argue that algorithmic progress has an aspect that is both
straightforward to measure and interesting: reductions over time in the compute
needed to reach past capabilities. We show that the number of floating-point
operations required to train a classifier to AlexNet-level performance on
ImageNet has decreased by a factor of 44x between 2012 and 2019. This
corresponds to algorithmic efficiency doubling every 16 months over a period of
7 years. By contrast, Moore's Law would only have yielded an 11x cost
improvement. We observe that hardware and algorithmic efficiency gains multiply
and can be on a similar scale over meaningful horizons, which suggests that a
good model of AI progress should integrate measures from both.
Related papers
- Approximating Metric Magnitude of Point Sets [4.522729058300309]
Metric magnitude is a measure of the "size" of point clouds with many desirable geometric properties.
It has been adapted to various mathematical contexts and recent work suggests that it can enhance machine learning and optimization algorithms.
In this paper, we study the magnitude problem, and show efficient ways of approximating it. We show that it can be cast as a convex optimization problem, but not as a submodular optimization.
The paper describes two new algorithms - an iterative approximation algorithm that converges fast and is accurate, and a subset selection method that makes the computation even faster.
arXiv Detail & Related papers (2024-09-06T17:15:28Z) - On the Efficiency of Convolutional Neural Networks [0.0]
Confronted with the immense computation that convnets use, deep learning researchers also became interested in efficiency.
We devised block fusion algorithms to implement all the layers of a residual block in a single kernel.
Our ConvFirst model with block-fusion kernels has less arithmetic complexity and greater computational efficiency.
arXiv Detail & Related papers (2024-04-04T17:39:41Z) - Algorithmic progress in language models [1.7402659488193557]
We investigate the rate at which algorithms for pre-training language models have improved since the advent of deep learning.
We use a dataset of over 200 language model evaluations on Wikitext and Penn Treebank spanning 2012-2023.
We find that the compute required to reach a set performance threshold has halved approximately every 8 months, with a 95% confidence interval of around 5 to 14 months.
arXiv Detail & Related papers (2024-03-09T06:26:21Z) - Algorithmic progress in computer vision [0.8547032097715571]
We investigate algorithmic progress in image classification on ImageNet.
We find that algorithmic improvements have been roughly as important as the scaling of compute for progress computer vision.
compute-augmenting algorithmic advances are made at a pace more than twice as fast as the rate usually associated with Moore's law.
arXiv Detail & Related papers (2022-12-10T00:18:05Z) - Pushing the Limits of Asynchronous Graph-based Object Detection with
Event Cameras [62.70541164894224]
We introduce several architecture choices which allow us to scale the depth and complexity of such models while maintaining low computation.
Our method runs 3.7 times faster than a dense graph neural network, taking only 8.4 ms per forward pass.
arXiv Detail & Related papers (2022-11-22T15:14:20Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Waypoint Planning Networks [66.72790309889432]
We propose a hybrid algorithm based on LSTMs with a local kernel - a classic algorithm such as A*, and a global kernel using a learned algorithm.
We compare WPN against A*, as well as related works including motion planning networks (MPNet) and value networks (VIN)
It is shown that WPN's search space is considerably less than A*, while being able to generate near optimal results.
arXiv Detail & Related papers (2021-05-01T18:02:01Z) - Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms.
The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.
We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z) - Efficient Computation of Expectations under Spanning Tree Distributions [67.71280539312536]
We propose unified algorithms for the important cases of first-order expectations and second-order expectations in edge-factored, non-projective spanning-tree models.
Our algorithms exploit a fundamental connection between gradients and expectations, which allows us to derive efficient algorithms.
arXiv Detail & Related papers (2020-08-29T14:58:26Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.