Gradient Descent, Stochastic Optimization, and Other Tales
- URL: http://arxiv.org/abs/2205.00832v2
- Date: Fri, 12 Jan 2024 10:46:15 GMT
- Title: Gradient Descent, Stochastic Optimization, and Other Tales
- Authors: Jun Lu
- Abstract summary: This tutorial doesn't shy away from addressing both the formal and informal aspects of gradient descent and optimization methods.
Gradient descent is one of the most popular algorithms to perform optimization and by far the most common way to optimize machine learning tasks.
In deep neural networks, the gradient followed by a single sample or a batch of samples is employed to save computational resources and escape from saddle points.
- Score: 8.034728173797953
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The goal of this paper is to debunk and dispel the magic behind black-box
optimizers and stochastic optimizers. It aims to build a solid foundation on
how and why the techniques work. This manuscript crystallizes this knowledge by
deriving from simple intuitions, the mathematics behind the strategies. This
tutorial doesn't shy away from addressing both the formal and informal aspects
of gradient descent and stochastic optimization methods. By doing so, it hopes
to provide readers with a deeper understanding of these techniques as well as
the when, the how and the why of applying these algorithms.
Gradient descent is one of the most popular algorithms to perform
optimization and by far the most common way to optimize machine learning tasks.
Its stochastic version receives attention in recent years, and this is
particularly true for optimizing deep neural networks. In deep neural networks,
the gradient followed by a single sample or a batch of samples is employed to
save computational resources and escape from saddle points. In 1951, Robbins
and Monro published \textit{A stochastic approximation method}, one of the
first modern treatments on stochastic optimization that estimates local
gradients with a new batch of samples. And now, stochastic optimization has
become a core technology in machine learning, largely due to the development of
the back propagation algorithm in fitting a neural network. The sole aim of
this article is to give a self-contained introduction to concepts and
mathematical tools in gradient descent and stochastic optimization.
Related papers
- Neural Gradient Learning and Optimization for Oriented Point Normal
Estimation [53.611206368815125]
We propose a deep learning approach to learn gradient vectors with consistent orientation from 3D point clouds for normal estimation.
We learn an angular distance field based on local plane geometry to refine the coarse gradient vectors.
Our method efficiently conducts global gradient approximation while achieving better accuracy and ability generalization of local feature description.
arXiv Detail & Related papers (2023-09-17T08:35:11Z) - Learning the Positions in CountSketch [49.57951567374372]
We consider sketching algorithms which first compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem.
In this work, we propose the first learning-based algorithms that also optimize the locations of the non-zero entries.
arXiv Detail & Related papers (2023-06-11T07:28:35Z) - A Particle-based Sparse Gaussian Process Optimizer [5.672919245950197]
We present a new swarm-swarm-based framework utilizing the underlying dynamical process of descent.
The biggest advantage of this approach is greater exploration around the current state before deciding descent descent.
arXiv Detail & Related papers (2022-11-26T09:06:15Z) - Learning to Optimize Quasi-Newton Methods [22.504971951262004]
This paper introduces a novel machine learning called LODO, which tries to online meta-learn the best preconditioner during optimization.
Unlike other L2O methods, LODO does not require any meta-training on a training task distribution.
We show that our gradient approximates the inverse Hessian in noisy loss landscapes and is capable of representing a wide range of inverse Hessians.
arXiv Detail & Related papers (2022-10-11T03:47:14Z) - On the efficiency of Stochastic Quasi-Newton Methods for Deep Learning [0.0]
We study the behaviour of quasi-Newton training algorithms for deep memory networks.
We show that quasi-Newtons are efficient and able to outperform in some instances the well-known first-order Adam run.
arXiv Detail & Related papers (2022-05-18T20:53:58Z) - Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer.
We show that there is a natural synergy between these two settings.
We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z) - SHINE: SHaring the INverse Estimate from the forward pass for bi-level
optimization and implicit models [15.541264326378366]
In recent years, implicit deep learning has emerged as a method to increase the depth of deep neural networks.
The training is performed as a bi-level problem, and its computational complexity is partially driven by the iterative inversion of a huge Jacobian matrix.
We propose a novel strategy to tackle this computational bottleneck from which many bi-level problems suffer.
arXiv Detail & Related papers (2021-06-01T15:07:34Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - A Primer on Zeroth-Order Optimization in Signal Processing and Machine
Learning [95.85269649177336]
ZO optimization iteratively performs three major steps: gradient estimation, descent direction, and solution update.
We demonstrate promising applications of ZO optimization, such as evaluating and generating explanations from black-box deep learning models, and efficient online sensor management.
arXiv Detail & Related papers (2020-06-11T06:50:35Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.