Related papers: Reparametrizing gradient descent

Reparametrizing gradient descent

URL: http://arxiv.org/abs/2010.04786v1
Date: Fri, 9 Oct 2020 20:22:29 GMT
Title: Reparametrizing gradient descent
Authors: David Sprunger
Abstract summary: We propose an optimization algorithm which we call norm-adapted gradient descent. Our algorithm can also be compared to quasi-Newton methods, but we seek roots rather than stationary points.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we propose an optimization algorithm which we call norm-adapted gradient descent. This algorithm is similar to other gradient-based optimization algorithms like Adam or Adagrad in that it adapts the learning rate of stochastic gradient descent at each iteration. However, rather than using statistical properties of observed gradients, norm-adapted gradient descent relies on a first-order estimate of the effect of a standard gradient descent update step, much like the Newton-Raphson method in many dimensions. Our algorithm can also be compared to quasi-Newton methods, but we seek roots rather than stationary points. Seeking roots can be justified by the fact that for models with sufficient capacity measured by nonnegative loss functions, roots coincide with global optima. This work presents several experiments where we have used our algorithm; in these results, it appears norm-adapted descent is particularly strong in regression settings but is also capable of training classifiers.

Related papers

An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes [17.804065824245402]
In machine learning applications, each loss function is non-negative and can be expressed as the composition of a square and its real-valued square root. We show how to apply the Gauss-Newton method or the Levssquardt method to minimize the average of smooth but possibly non-negative functions.
arXiv Detail & Related papers (2024-07-05T08:53:06Z)
Neural Gradient Learning and Optimization for Oriented Point Normal Estimation [53.611206368815125]
We propose a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation. We learn an angular distance field based on local plane geometry to refine the coarse gradient vectors. Our method efficiently conducts global gradient approximation while achieving better accuracy and ability generalization of local feature description.
arXiv Detail & Related papers (2023-09-17T08:35:11Z)
One-step corrected projected stochastic gradient descent for statistical estimation [49.1574468325115]
It is based on the projected gradient descent on the log-likelihood function corrected by a single step of the Fisher scoring algorithm. We show theoretically and by simulations that it is an interesting alternative to the usual gradient descent with averaging or the adaptative gradient descent.
arXiv Detail & Related papers (2023-06-09T13:43:07Z)
Convergence of Batch Stochastic Gradient Descent Methods with Approximate Gradients and/or Noisy Measurements: Theory and Computational Results [0.9900482274337404]
We study convex optimization using a very general formulation called BSGD (Block Gradient Descent) We establish conditions for BSGD to converge to the global minimum, based on approximation theory. We show that when approximate gradients are used, BSGD converges while momentum-based methods can diverge.
arXiv Detail & Related papers (2022-09-12T16:23:15Z)
Comparing Classes of Estimators: When does Gradient Descent Beat Ridge Regression in Linear Models? [46.01087792062936]
We compare classes of estimators via the relative performance of the emphbest method in the class This allows us to rigorously quantify the tuning sensitivity of learning algorithms.
arXiv Detail & Related papers (2021-08-26T16:01:37Z)
Decreasing scaling transition from adaptive gradient descent to stochastic gradient descent [1.7874193862154875]
We propose a decreasing scaling transition from adaptive gradient descent to gradient descent method DSTAda. Our experimental results show that DSTAda has a faster speed, higher accuracy, and better stability and robustness.
arXiv Detail & Related papers (2021-06-12T11:28:58Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
Channel-Directed Gradients for Optimization of Convolutional Neural Networks [50.34913837546743]
We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error. We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental.
arXiv Detail & Related papers (2020-08-25T00:44:09Z)
Variance Reduction with Sparse Gradients [82.41780420431205]
Variance reduction methods such as SVRG and SpiderBoost use a mixture of large and small batch gradients. We introduce a new sparsity operator: The random-top-k operator. Our algorithm consistently outperforms SpiderBoost on various tasks including image classification, natural language processing, and sparse matrix factorization.
arXiv Detail & Related papers (2020-01-27T08:23:58Z)
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks. In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems. Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.