Beyond adaptive gradient: Fast-Controlled Minibatch Algorithm for large-scale optimization
- URL: http://arxiv.org/abs/2411.15795v1
- Date: Sun, 24 Nov 2024 11:46:47 GMT
- Title: Beyond adaptive gradient: Fast-Controlled Minibatch Algorithm for large-scale optimization
- Authors: Corrado Coppola, Lorenzo Papa, Irene Amerini, Laura Palagi,
- Abstract summary: We introduce F-CMA, a Fast-Controlled Mini-batch Algorithm with a random reshuffling method featuring a sufficient decrease condition and a line-search procedure to ensure loss reduction per epoch.
Tests show significant improvements, including a decrease in the overall training time by 68%, an increase in per-epoch efficiency by up to 20%, and in model accuracy by up to 5%.
- Score: 1.6749379740049926
- License:
- Abstract: Adaptive gradient methods have been increasingly adopted by deep learning community due to their fast convergence and reduced sensitivity to hyper-parameters. However, these methods come with limitations, such as increased memory requirements for elements like moving averages and a poorly understood convergence theory. To overcome these challenges, we introduce F-CMA, a Fast-Controlled Mini-batch Algorithm with a random reshuffling method featuring a sufficient decrease condition and a line-search procedure to ensure loss reduction per epoch, along with its deterministic proof of global convergence to a stationary point. To evaluate the F-CMA, we integrate it into conventional training protocols for classification tasks involving both convolutional neural networks and vision transformer models, allowing for a direct comparison with popular optimizers. Computational tests show significant improvements, including a decrease in the overall training time by up to 68%, an increase in per-epoch efficiency by up to 20%, and in model accuracy by up to 5%.
Related papers
- Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - Adaptive Top-K in SGD for Communication-Efficient Distributed Learning [14.867068493072885]
This paper proposes a novel adaptive Top-K in SGD framework that enables an adaptive degree of sparsification for each gradient descent step to optimize the convergence performance.
numerical results on the MNIST and CIFAR-10 datasets demonstrate that the proposed adaptive Top-K algorithm in SGD achieves a significantly better convergence rate compared to state-of-the-art methods.
arXiv Detail & Related papers (2022-10-24T18:33:35Z) - Low-Precision Arithmetic for Fast Gaussian Processes [39.720581185327816]
Low-precision arithmetic has had a transformative effect on the training of neural networks.
We propose a multi-faceted approach involving conjugate gradients with re-orthogonalization, mixed precision, and preconditioning.
Our approach significantly improves the numerical stability and practical performance of conjugate gradients in low-precision over a wide range of settings.
arXiv Detail & Related papers (2022-07-14T12:20:46Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Adaptive Gradient Method with Resilience and Momentum [120.83046824742455]
We propose an Adaptive Gradient Method with Resilience and Momentum (AdaRem)
AdaRem adjusts the parameter-wise learning rate according to whether the direction of one parameter changes in the past is aligned with the direction of the current gradient.
Our method outperforms previous adaptive learning rate-based algorithms in terms of the training speed and the test error.
arXiv Detail & Related papers (2020-10-21T14:49:00Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Semi-Implicit Back Propagation [1.5533842336139065]
We propose a semi-implicit back propagation method for neural network training.
The difference on the neurons are propagated in a backward fashion and the parameters are updated with proximal mapping.
Experiments on both MNIST and CIFAR-10 demonstrate that the proposed algorithm leads to better performance in terms of both loss decreasing and training/validation accuracy.
arXiv Detail & Related papers (2020-02-10T03:26:09Z) - Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications.
In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training.
Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.