Related papers: Accelerating AI Performance using Anderson Extrapolation on GPUs

Accelerating AI Performance using Anderson Extrapolation on GPUs

URL: http://arxiv.org/abs/2410.19460v2
Date: Thu, 19 Dec 2024 04:49:52 GMT
Title: Accelerating AI Performance using Anderson Extrapolation on GPUs
Authors: Saleem Abdul Fattah Ahmed Al Dajani, David E. Keyes,
Abstract summary: We present a novel approach for accelerating AI performance by leveraging Anderson extrapolation.<n>By identifying the crossover point where a mixing penalty is incurred, the method focuses on reducing iterations to convergence.<n>We demonstrate significant improvements in both training and inference, motivated by scalability and efficiency extensions to the realm of high-performance computing.
Score: 2.114333871769023
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a novel approach for accelerating AI performance by leveraging Anderson extrapolation, a vector-to-vector mapping technique based on a window of historical iterations. By identifying the crossover point (Fig. 1) where a mixing penalty is incurred, the method focuses on reducing iterations to convergence, with fewer more compute-intensive but generally cacheable iterations, balancing speed and memory usage with accuracy and algorithmic stability, respectively. We demonstrate significant improvements, in both training and inference, motivated by scalability and efficiency extensions to the realm of high-performance computing (HPC).

Related papers

Linearly Convergent Mixup Learning [0.0]
We present two novel algorithms that extend to a broader range of binary classification models. Unlike gradient-based approaches, our algorithms do not require hyper parameters like learning rates, simplifying their implementation and optimization. Our algorithms achieve faster convergence to the optimal solution compared to descent gradient approaches, and that mixup data augmentation consistently improves the predictive performance across various loss functions.
arXiv Detail & Related papers (2025-01-14T02:33:40Z)
Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment [81.84950252537618]
This paper reveals a unified game-theoretic connection between iterative BOND and self-play alignment. We establish a novel framework, WIN rate Dominance (WIND), with a series of efficient algorithms for regularized win rate dominance optimization.
arXiv Detail & Related papers (2024-10-28T04:47:39Z)
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers [58.5711048151424]
We introduce SPARSEK Attention, a novel sparse attention mechanism designed to overcome computational and memory obstacles. Our approach integrates a scoring network and a differentiable top-k mask operator, SPARSEK, to select a constant number of KV pairs for each query. Experimental results reveal that SPARSEK Attention outperforms previous sparse attention methods.
arXiv Detail & Related papers (2024-06-24T15:55:59Z)
Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training. Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z)
DOF: Accelerating High-order Differential Operators with Forward Propagation [40.71528485918067]
We propose an efficient framework, Differential Operator with Forward-propagation (DOF), for calculating general second-order differential operators without losing any precision. We demonstrate two times improvement in efficiency and reduced memory consumption on any architectures. Empirical results illustrate that our method surpasses traditional automatic differentiation (AutoDiff) techniques, achieving 2x improvement on the structure and nearly 20x improvement on the sparsity.
arXiv Detail & Related papers (2024-02-15T05:59:21Z)
Ordering for Non-Replacement SGD [7.11967773739707]
We seek to find an ordering that can improve the convergence rates for the non-replacement form of the algorithm. We develop optimal orderings for constant and decreasing step sizes for strongly convex and convex functions. In addition, we are able to combine the ordering with mini-batch and further apply it to more complex neural networks.
arXiv Detail & Related papers (2023-06-28T00:46:58Z)
Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization [71.69092462147292]
Performance embeddings enable knowledge transfer of performance tuning between applications. We demonstrate this transfer tuning approach on case studies in deep neural networks, dense and sparse linear algebra compositions, and numerical weather prediction stencils.
arXiv Detail & Related papers (2023-03-14T15:51:35Z)
Accelerating Real-Time Coupled Cluster Methods with Single-Precision Arithmetic and Adaptive Numerical Integration [3.469636229370366]
We show that single-precision arithmetic reduces both the storage and multiplicative costs of the real-time simulation by approximately a factor of two. Additional speedups of up to a factor of 14 in test simulations of water clusters are obtained via a straightforward-based implementation.
arXiv Detail & Related papers (2022-05-10T21:21:49Z)
Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples. We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment. We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z)
Accelerated Componentwise Gradient Boosting using Efficient Data Representation and Momentum-based Optimization [1.3159777131162964]
Componentwise boosting (CWB) builds on additive models as base learners to ensure interpretability. One downside of CWB is its computational complexity in terms of memory and runtime. We propose two techniques to overcome these issues without losing the properties of CWB.
arXiv Detail & Related papers (2021-10-07T14:49:52Z)
Nesterov Accelerated ADMM for Fast Diffeomorphic Image Registration [63.15453821022452]
Recent developments in approaches based on deep learning have achieved sub-second runtimes for DiffIR. We propose a simple iterative scheme that functionally composes intermediate non-stationary velocity fields. We then propose a convex optimisation model that uses a regularisation term of arbitrary order to impose smoothness on these velocity fields.
arXiv Detail & Related papers (2021-09-26T19:56:45Z)
Fast and Robust Iterative Closest Point [32.42799285301607]
Iterative Closest Point (ICP) is a fundamental technique for rigid registration between two point sets. Recent work such as Sparse ICP achieves robustness via sparsity optimization at the cost of computational speed. We show that the classical point-to-point ICP can be treated as a majorization-minimization (MM) algorithm, and propose an Anderson acceleration approach to speed up its convergence.
arXiv Detail & Related papers (2020-07-15T11:32:53Z)
Efficient Learning of Generative Models via Finite-Difference Score Matching [111.55998083406134]
We present a generic strategy to efficiently approximate any-order directional derivative with finite difference. Our approximation only involves function evaluations, which can be executed in parallel, and no gradient computations.
arXiv Detail & Related papers (2020-07-07T10:05:01Z)
Differentiable Adaptive Computation Time for Visual Reasoning [4.7518908453572]
This paper presents a novel attention-based algorithm for achieving adaptive computation called DACT. In particular, we study its application to the widely known MAC architecture. We show that by increasing the maximum number of steps used, we surpass the accuracy of even our best non-adaptive MAC in the CLEVR dataset.
arXiv Detail & Related papers (2020-04-27T13:20:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.