Dual Stochastic Natural Gradient Descent and convergence of interior
half-space gradient approximations
- URL: http://arxiv.org/abs/2001.06744v2
- Date: Fri, 30 Apr 2021 16:45:51 GMT
- Title: Dual Stochastic Natural Gradient Descent and convergence of interior
half-space gradient approximations
- Authors: Borja S\'anchez-L\'opez and Jesus Cerquides
- Abstract summary: Multinomial logistic regression (MLR) is widely used in statistics and machine learning.
gradient descent (SGD) is the most common approach for determining the parameters of a MLR model in big data scenarios.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The multinomial logistic regression (MLR) model is widely used in statistics
and machine learning. Stochastic gradient descent (SGD) is the most common
approach for determining the parameters of a MLR model in big data scenarios.
However, SGD has slow sub-linear rates of convergence. A way to improve these
rates of convergence is to use manifold optimization. Along this line,
stochastic natural gradient descent (SNGD), proposed by Amari, was proven to be
Fisher efficient when it converged. However, SNGD is not guaranteed to converge
and it is computationally too expensive for MLR models with a large number of
parameters.
Here, we propose a stochastic optimization method for MLR based on manifold
optimization concepts which (i) has per-iteration computational complexity is
linear in the number of parameters and (ii) can be proven to converge.
To achieve (i) we establish that the family of joint distributions for MLR is
a dually flat manifold and we use that to speed up calculations.
S\'anchez-L\'opez and Cerquides have recently introduced convergent stochastic
natural gradient descent (CSNGD), a variant of SNGD whose convergence is
guaranteed. To obtain (ii) our algorithm uses the fundamental idea from CSNGD,
thus relying on an independent sequence to build a bounded approximation of the
natural gradient. We call the resulting algorithm dual stochastic natural
gradient descent (DNSGD). By generalizing a result from Sunehag et al., we
prove that DSNGD converges. Furthermore, we prove that the computational
complexity of DSNGD iterations are linear on the number of variables of the
model.
Related papers
- Zeroth-Order Fine-Tuning of LLMs in Random Subspaces [66.27334633749734]
As language models grow in size, memory demands for backpropagation increase.
Zeroth-order (ZOZO) optimization methods offer a memory-efficient alternative.
We show that SubZero enhances fine-tuning and achieves faster results compared to standard ZOZO approaches.
arXiv Detail & Related papers (2024-10-11T17:01:43Z) - Faster Sampling via Stochastic Gradient Proximal Sampler [28.422547264326468]
Proximal samplers (SPS) for sampling from non-log-concave distributions are studied.
We show that the convergence to the target distribution can be guaranteed as long as the algorithm trajectory is bounded.
We provide two implementable variants based on Langevin dynamics (SGLD) and Langevin-MALA, giving rise to SPS-SGLD and SPS-MALA.
arXiv Detail & Related papers (2024-05-27T00:53:18Z) - Smoothing ADMM for Sparse-Penalized Quantile Regression with Non-Convex
Penalties [8.294148737585543]
This paper investigates concave and clipped quantile regression in the presence of nonsecondary absolute and non-smooth convergence penalties.
We introduce a novel-loop ADM algorithm with an increasing penalty multiplier, named SIAD, specifically for sparse regression.
arXiv Detail & Related papers (2023-09-04T21:48:51Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Improving Computational Complexity in Statistical Models with
Second-Order Information [32.64382185990981]
We study the normalized gradient descent (NormGD) algorithm for solving parameter estimation in parametric statistical models.
We demonstrate that the NormGD algorithm achieves the optimal overall computational complexity $mathcalO(n)$ to reach the final statistical radius.
This computational complexity is cheaper than that of the fixed step-size gradient descent algorithm.
arXiv Detail & Related papers (2022-02-09T01:32:50Z) - Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and
Beyond [63.59034509960994]
We study shuffling-based variants: minibatch and local Random Reshuffling, which draw gradients without replacement.
For smooth functions satisfying the Polyak-Lojasiewicz condition, we obtain convergence bounds which show that these shuffling-based variants converge faster than their with-replacement counterparts.
We propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in near-homogeneous settings.
arXiv Detail & Related papers (2021-10-20T02:25:25Z) - Scalable Variational Gaussian Processes via Harmonic Kernel
Decomposition [54.07797071198249]
We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability.
We demonstrate that, on a range of regression and classification problems, our approach can exploit input space symmetries such as translations and reflections.
Notably, our approach achieves state-of-the-art results on CIFAR-10 among pure GP models.
arXiv Detail & Related papers (2021-06-10T18:17:57Z) - Laplace Matching for fast Approximate Inference in Generalized Linear
Models [27.70274403550477]
We propose an approximate inference framework primarily designed to be computationally cheap while still achieving high approximation quality.
The concept, which we call emphLaplace Matching, involves closed-form, approximate, bi-directional transformations between the parameter spaces of exponential families.
This effectively turns inference in GLMs into conjugate inference (with small approximation errors)
arXiv Detail & Related papers (2021-05-07T08:25:17Z) - Convergence Analysis of Homotopy-SGD for non-convex optimization [43.71213126039448]
We present a first-order algorithm based on a combination of homotopy methods and SGD, called Gradienty-Stoch Descent (H-SGD)
Under some assumptions, we conduct a theoretical analysis of the proposed problem.
Experimental results show that H-SGD can outperform SGD.
arXiv Detail & Related papers (2020-11-20T09:50:40Z) - Convergence of Meta-Learning with Task-Specific Adaptation over Partial
Parameters [152.03852111442114]
Although model-agnostic metalearning (MAML) is a very successful algorithm meta-learning practice, it can have high computational complexity.
Our paper shows that such complexity can significantly affect the overall convergence performance of ANIL.
arXiv Detail & Related papers (2020-06-16T19:57:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.