Gravilon: Applications of a New Gradient Descent Method to Machine
Learning
- URL: http://arxiv.org/abs/2008.11370v2
- Date: Wed, 28 Oct 2020 19:15:31 GMT
- Title: Gravilon: Applications of a New Gradient Descent Method to Machine
Learning
- Authors: Chad Kelterborn, Marcin Mazur, and Bogdan V. Petrenko
- Abstract summary: We provide a novel gradient descent algorithm, called Gravilon, that uses the geometry of the hypersurface to modify the length of the step in the direction of the gradient.
Using neural networks, we provide promising experimental results comparing the accuracy and efficiency of Gravilon against commonly used gradient descent algorithms on MNIST digit classification.
- Score: 0.5352699766206809
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gradient descent algorithms have been used in countless applications since
the inception of Newton's method. The explosion in the number of applications
of neural networks has re-energized efforts in recent years to improve the
standard gradient descent method in both efficiency and accuracy. These methods
modify the effect of the gradient in updating the values of the parameters.
These modifications often incorporate hyperparameters: additional variables
whose values must be specified at the outset of the program. We provide, below,
a novel gradient descent algorithm, called Gravilon, that uses the geometry of
the hypersurface to modify the length of the step in the direction of the
gradient. Using neural networks, we provide promising experimental results
comparing the accuracy and efficiency of the Gravilon method against commonly
used gradient descent algorithms on MNIST digit classification.
Related papers
- Gradient Descent with Provably Tuned Learning-rate Schedules [14.391648046717073]
We develop novel analytical tools for provably tuning factors in gradient-based algorithms.<n>Our analysis applies to neural networks with commonly used activation functions.
arXiv Detail & Related papers (2025-12-04T18:49:58Z) - Linear Gradient Prediction with Control Variates [5.907996850796288]
We propose a new way of training neural networks, with the goal of reducing training cost.<n>Our method uses approximate predicted gradients instead of the full gradients that require an expensive backward pass.<n>We empirically show the efficacy of the technique on a vision transformer classification task.
arXiv Detail & Related papers (2025-11-07T12:09:48Z) - Neural Gradient Learning and Optimization for Oriented Point Normal
Estimation [53.611206368815125]
We propose a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation.
We learn an angular distance field based on local plane geometry to refine the coarse gradient vectors.
Our method efficiently conducts global gradient approximation while achieving better accuracy and ability generalization of local feature description.
arXiv Detail & Related papers (2023-09-17T08:35:11Z) - Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations.
For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two.
For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
arXiv Detail & Related papers (2022-10-13T15:09:54Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Gradient Correction beyond Gradient Descent [63.33439072360198]
gradient correction is apparently the most crucial aspect for the training of a neural network.
We introduce a framework (textbfGCGD) to perform gradient correction.
Experiment results show that our gradient correction framework can effectively improve the gradient quality to reduce training epochs by $sim$ 20% and also improve the network performance.
arXiv Detail & Related papers (2022-03-16T01:42:25Z) - Adapting Stepsizes by Momentumized Gradients Improves Optimization and
Generalization [89.66571637204012]
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
textscAdaMomentum on vision, and achieves state-the-art results consistently on other tasks including language processing.
arXiv Detail & Related papers (2021-06-22T03:13:23Z) - Normalized Gradient Descent for Variational Quantum Algorithms [4.403985869332685]
Vari quantum algorithms (VQAs) are promising methods that leverage noisy quantum computers.
NGD method, which employs the normalized gradient vector to update the parameters, has been successfully utilized in several optimization problems.
We propose a new NGD that can attain the faster convergence than the ordinary NGD.
arXiv Detail & Related papers (2021-06-21T11:03:12Z) - Decreasing scaling transition from adaptive gradient descent to
stochastic gradient descent [1.7874193862154875]
We propose a decreasing scaling transition from adaptive gradient descent to gradient descent method DSTAda.
Our experimental results show that DSTAda has a faster speed, higher accuracy, and better stability and robustness.
arXiv Detail & Related papers (2021-06-12T11:28:58Z) - Research of Damped Newton Stochastic Gradient Descent Method for Neural
Network Training [6.231508838034926]
First-order methods like gradient descent(SGD) are recently the popular optimization method to train deep neural networks (DNNs)
In this paper, we propose the Damped Newton Descent(DN-SGD) and Gradient Descent Damped Newton(SGD-DN) methods to train DNNs for regression problems with Mean Square Error(MSE) and classification problems with Cross-Entropy Loss(CEL)
Our methods just accurately compute a small part of the parameters, which greatly reduces the computational cost and makes the learning process much faster and more accurate than SGD.
arXiv Detail & Related papers (2021-03-31T02:07:18Z) - Exploiting Adam-like Optimization Algorithms to Improve the Performance
of Convolutional Neural Networks [82.61182037130405]
gradient descent (SGD) is the main approach for training deep networks.
In this work, we compare Adam based variants based on the difference between the present and the past gradients.
We have tested ensemble of networks and the fusion with ResNet50 trained with gradient descent.
arXiv Detail & Related papers (2021-03-26T18:55:08Z) - Reparametrizing gradient descent [0.0]
We propose an optimization algorithm which we call norm-adapted gradient descent.
Our algorithm can also be compared to quasi-Newton methods, but we seek roots rather than stationary points.
arXiv Detail & Related papers (2020-10-09T20:22:29Z) - Variance Reduction for Deep Q-Learning using Stochastic Recursive
Gradient [51.880464915253924]
Deep Q-learning algorithms often suffer from poor gradient estimations with an excessive variance.
This paper introduces the framework for updating the gradient estimates in deep Q-learning, achieving a novel algorithm called SRG-DQN.
arXiv Detail & Related papers (2020-07-25T00:54:20Z) - Neural gradients are near-lognormal: improved quantized and sparse
training [35.28451407313548]
We find that the distribution of neural gradients is approximately lognormal.
We suggest two closed-form analytical methods to reduce the computational and memory burdens of neural gradients.
To the best of our knowledge, this paper is the first to (1) quantize the gradients to 6-bit floating-point formats, or (2) achieve up to 85% gradient sparsity -- in each case without accuracy.
arXiv Detail & Related papers (2020-06-15T07:00:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.