Accelerating AI Performance using Anderson Extrapolation on GPUs
- URL: http://arxiv.org/abs/2410.19460v2
- Date: Thu, 19 Dec 2024 04:49:52 GMT
- Title: Accelerating AI Performance using Anderson Extrapolation on GPUs
- Authors: Saleem Abdul Fattah Ahmed Al Dajani, David E. Keyes,
- Abstract summary: We present a novel approach for accelerating AI performance by leveraging Anderson extrapolation.
By identifying the crossover point where a mixing penalty is incurred, the method focuses on reducing iterations to convergence.
We demonstrate significant improvements in both training and inference, motivated by scalability and efficiency extensions to the realm of high-performance computing.
- Score: 2.114333871769023
- License:
- Abstract: We present a novel approach for accelerating AI performance by leveraging Anderson extrapolation, a vector-to-vector mapping technique based on a window of historical iterations. By identifying the crossover point (Fig. 1) where a mixing penalty is incurred, the method focuses on reducing iterations to convergence, with fewer more compute-intensive but generally cacheable iterations, balancing speed and memory usage with accuracy and algorithmic stability, respectively. We demonstrate significant improvements, in both training and inference, motivated by scalability and efficiency extensions to the realm of high-performance computing (HPC).
Related papers
- Linearly Convergent Mixup Learning [0.0]
We present two novel algorithms that extend to a broader range of binary classification models.
Unlike gradient-based approaches, our algorithms do not require hyper parameters like learning rates, simplifying their implementation and optimization.
Our algorithms achieve faster convergence to the optimal solution compared to descent gradient approaches, and that mixup data augmentation consistently improves the predictive performance across various loss functions.
arXiv Detail & Related papers (2025-01-14T02:33:40Z) - Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment.
We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z) - Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - DOF: Accelerating High-order Differential Operators with Forward
Propagation [40.71528485918067]
We propose an efficient framework, Differential Operator with Forward-propagation (DOF), for calculating general second-order differential operators without losing any precision.
We demonstrate two times improvement in efficiency and reduced memory consumption on any architectures.
Empirical results illustrate that our method surpasses traditional automatic differentiation (AutoDiff) techniques, achieving 2x improvement on the structure and nearly 20x improvement on the sparsity.
arXiv Detail & Related papers (2024-02-15T05:59:21Z) - Ordering for Non-Replacement SGD [7.11967773739707]
We seek to find an ordering that can improve the convergence rates for the non-replacement form of the algorithm.
We develop optimal orderings for constant and decreasing step sizes for strongly convex and convex functions.
In addition, we are able to combine the ordering with mini-batch and further apply it to more complex neural networks.
arXiv Detail & Related papers (2023-06-28T00:46:58Z) - Performance Embeddings: A Similarity-based Approach to Automatic
Performance Optimization [71.69092462147292]
Performance embeddings enable knowledge transfer of performance tuning between applications.
We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils.
arXiv Detail & Related papers (2023-03-14T15:51:35Z) - Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples.
We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment.
We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z) - Nesterov Accelerated ADMM for Fast Diffeomorphic Image Registration [63.15453821022452]
Recent developments in approaches based on deep learning have achieved sub-second runtimes for DiffIR.
We propose a simple iterative scheme that functionally composes intermediate non-stationary velocity fields.
We then propose a convex optimisation model that uses a regularisation term of arbitrary order to impose smoothness on these velocity fields.
arXiv Detail & Related papers (2021-09-26T19:56:45Z) - Fast and Robust Iterative Closest Point [32.42799285301607]
Iterative Closest Point (ICP) is a fundamental technique for rigid registration between two point sets.
Recent work such as Sparse ICP achieves robustness via sparsity optimization at the cost of computational speed.
We show that the classical point-to-point ICP can be treated as a majorization-minimization (MM) algorithm, and propose an Anderson acceleration approach to speed up its convergence.
arXiv Detail & Related papers (2020-07-15T11:32:53Z) - Efficient Learning of Generative Models via Finite-Difference Score
Matching [111.55998083406134]
We present a generic strategy to efficiently approximate any-order directional derivative with finite difference.
Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations.
arXiv Detail & Related papers (2020-07-07T10:05:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.