A Computationally Efficient Sparsified Online Newton Method
- URL: http://arxiv.org/abs/2311.10085v1
- Date: Thu, 16 Nov 2023 18:44:22 GMT
- Title: A Computationally Efficient Sparsified Online Newton Method
- Authors: Fnu Devvrit, Sai Surya Duvvuri, Rohan Anil, Vineet Gupta, Cho-Jui
Hsieh, Inderjit Dhillon
- Abstract summary: Sparsified Online Newton (SONew) is a memory-efficient second-order algorithm that yields a sparsified yet effective preconditioner.
We achieve up to 30% faster convergence, 3.4% relative improvement in validation, and 80% relative improvement in training loss.
- Score: 48.78646010774149
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Second-order methods hold significant promise for enhancing the convergence
of deep neural network training; however, their large memory and computational
demands have limited their practicality. Thus there is a need for scalable
second-order methods that can efficiently train large models. In this paper, we
introduce the Sparsified Online Newton (SONew) method, a memory-efficient
second-order algorithm that yields a sparsified yet effective preconditioner.
The algorithm emerges from a novel use of the LogDet matrix divergence measure;
we combine it with sparsity constraints to minimize regret in the online convex
optimization framework. Empirically, we test our method on large scale
benchmarks of up to 1B parameters. We achieve up to 30% faster convergence,
3.4% relative improvement in validation performance, and 80% relative
improvement in training loss, in comparison to memory efficient optimizers
including first order methods. Powering the method is a surprising fact --
imposing structured sparsity patterns, like tridiagonal and banded structure,
requires little to no overhead, making it as efficient and parallelizable as
first-order methods. In wall-clock time, tridiagonal SONew is only about 3%
slower per step than first-order methods but gives overall gains due to much
faster convergence. In contrast, one of the state-of-the-art (SOTA)
memory-intensive second-order methods, Shampoo, is unable to scale to large
benchmarks. Additionally, while Shampoo necessitates significant engineering
efforts to scale to large benchmarks, SONew offers a more straightforward
implementation, increasing its practical appeal. SONew code is available at:
https://github.com/devvrit/SONew
Related papers
- An Efficient Procedure for Computing Bayesian Network Structure Learning [0.9208007322096532]
We propose a globally optimal Bayesian network structure discovery algorithm based on a progressively leveled scoring approach.
Experimental results indicate that our method, when using only memory, not only reduces peak memory usage but also improves computational efficiency.
arXiv Detail & Related papers (2024-07-24T07:59:18Z) - Time-, Memory- and Parameter-Efficient Visual Adaptation [75.28557015773217]
We propose an adaptation method which does not backpropagate gradients through the backbone.
We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone.
arXiv Detail & Related papers (2024-02-05T10:55:47Z) - Eva: A General Vectorized Approximation Framework for Second-order
Optimization [16.647611352181574]
We present a memory- and time-efficient second-order algorithm named Eva with two novel techniques.
We derive an efficient update formula without explicitly computing the inverse of using the Sherman-Morrison formula.
Experiments show that Eva reduces the end-to-end training time up to 2.05x and 2.42x compared to first-order SGD and second-order algorithms.
arXiv Detail & Related papers (2023-08-04T03:51:38Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Asymmetric Scalable Cross-modal Hashing [51.309905690367835]
Cross-modal hashing is a successful method to solve large-scale multimedia retrieval issue.
We propose a novel Asymmetric Scalable Cross-Modal Hashing (ASCMH) to address these issues.
Our ASCMH outperforms the state-of-the-art cross-modal hashing methods in terms of accuracy and efficiency.
arXiv Detail & Related papers (2022-07-26T04:38:47Z) - On the efficiency of Stochastic Quasi-Newton Methods for Deep Learning [0.0]
We study the behaviour of quasi-Newton training algorithms for deep memory networks.
We show that quasi-Newtons are efficient and able to outperform in some instances the well-known first-order Adam run.
arXiv Detail & Related papers (2022-05-18T20:53:58Z) - Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples.
We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment.
We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - Second-order Neural Network Training Using Complex-step Directional
Derivative [41.4333906662624]
We introduce a numerical algorithm for second-order neural network training.
We tackle the practical obstacle of Hessian calculation by using the complex-step finite difference.
We believe our method will inspire a wide-range of new algorithms for deep learning and numerical optimization.
arXiv Detail & Related papers (2020-09-15T13:46:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.