Related papers: New logarithmic step size for stochastic gradient descent

Related papers

Towards Continuous-Time Approximations for Stochastic Gradient Descent without Replacement [1.7767466724342065]
Gradient optimization algorithms using epochs, that is those based on descent without replacement (SGDo), are predominantly used to train machine learning models in practice.<n>We propose a continuous-time approximation to SGDo with additive noise based on a Young differential equation driven by a process we call an "epoched Brownian motion"<n>We show its usefulness by proving the almost sure convergence of the continuous-time approximation for strongly convex objectives and learning rate schedules of the form $u_t = frac1(+t), in (0,1)$.
arXiv Detail & Related papers (2025-12-04T11:52:20Z)
Closed-Form Last Layer Optimization [72.49151473937319]
Under a squared loss, the optimal solution to the linear last layer weights is known in closed-form.<n>We show this is equivalent to alternating between gradient descent steps on the backbone and closed-form updates on the last layer.
arXiv Detail & Related papers (2025-10-06T09:14:39Z)
Posterior Approximation using Stochastic Gradient Ascent with Adaptive Stepsize [24.464140786923476]
posterior approximation allow nonparametrics such as Dirichlet process mixture to scale up to larger dataset at fractional cost. gradient ascent is a modern approach to machine learning and is widely deployed in the training of deep neural networks. In this work, we explore using gradient ascent as a fast algorithm for the posterior approximation of Dirichlet process mixture.
arXiv Detail & Related papers (2024-12-12T05:33:23Z)
Modified Step Size for Enhanced Stochastic Gradient Descent: Convergence and Experiments [0.0]
This paper introduces a novel approach to the performance of the gradient descent (SGD) algorithm by incorporating a modified decay step size based on $frac1sqrttt. The proposed step size integrates a logarithmic step term, leading to the selection of smaller values in the final iteration. To the effectiveness of our approach, we conducted numerical experiments on image classification tasks using the FashionMNIST, andARAR datasets.
arXiv Detail & Related papers (2023-09-03T19:21:59Z)
Relationship between Batch Size and Number of Steps Needed for Nonconvex Optimization of Stochastic Gradient Descent using Armijo Line Search [0.8158530638728501]
We show that SGD performs better than other deep learning networks when it uses deep numerical line. The results indicate that the number of steps needed for SFO as the batch size grows can be estimated.
arXiv Detail & Related papers (2023-07-25T21:59:17Z)
Dataset Distillation with Convexified Implicit Gradients [69.16247946639233]
We show how implicit gradients can be effectively used to compute meta-gradient updates. We further equip the algorithm with a convexified approximation that corresponds to learning on top of a frozen finite-width neural kernel.
arXiv Detail & Related papers (2023-02-13T23:53:16Z)
Towards Noise-adaptive, Problem-adaptive Stochastic Gradient Descent [7.176107039687231]
We design step-size schemes that make gradient descent (SGD) adaptive to (i) the noise. We prove that $T$ iterations of SGD with Nesterov iterations can be near optimal. Compared to other step-size schemes, we demonstrate the effectiveness of a novel novel exponential step-size scheme.
arXiv Detail & Related papers (2021-10-21T19:22:14Z)
Scaling Neural Tangent Kernels via Sketching and Random Features [53.57615759435126]
Recent works report that NTK regression can outperform finitely-wide neural networks trained on small-scale datasets. We design a near input-sparsity time approximation algorithm for NTK, by sketching the expansions of arc-cosine kernels. We show that a linear regressor trained on our CNTK features matches the accuracy of exact CNTK on CIFAR-10 dataset while achieving 150x speedup.
arXiv Detail & Related papers (2021-06-15T04:44:52Z)
Exploiting Adam-like Optimization Algorithms to Improve the Performance of Convolutional Neural Networks [82.61182037130405]
gradient descent (SGD) is the main approach for training deep networks. In this work, we compare Adam based variants based on the difference between the present and the past gradients. We have tested ensemble of networks and the fusion with ResNet50 trained with gradient descent.
arXiv Detail & Related papers (2021-03-26T18:55:08Z)
Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave. At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z)
Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error. Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z)
On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems [75.58134963501094]
This paper analyzes the trajectories of gradient descent (SGD) We show that SGD avoids saddle points/manifolds with $1$ for strict step-size policies.
arXiv Detail & Related papers (2020-06-19T14:11:26Z)
On the Promise of the Stochastic Generalized Gauss-Newton Method for Training DNNs [37.96456928567548]
We study a generalized Gauss-Newton method (SGN) for training DNNs. SGN is a second-order optimization method, with efficient iterations, that we demonstrate to often require substantially fewer iterations than standard SGD to converge. We show that SGN does not only substantially improve over SGD in terms of the number of iterations, but also in terms of runtime. This is made possible by an efficient, easy-to-use and flexible implementation of SGN we propose in the Theano deep learning platform.
arXiv Detail & Related papers (2020-06-03T17:35:54Z)
Carath\'eodory Sampling for Stochastic Gradient Descent [79.55586575988292]
We present an approach that is inspired by classical results of Tchakaloff and Carath'eodory about measure reduction. We adaptively select the descent steps where the measure reduction is carried out. We combine this with Block Coordinate Descent so that measure reduction can be done very cheaply.
arXiv Detail & Related papers (2020-06-02T17:52:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.