An Exponentially Increasing Step-size for Parameter Estimation in
Statistical Models
- URL: http://arxiv.org/abs/2205.07999v1
- Date: Mon, 16 May 2022 21:36:22 GMT
- Title: An Exponentially Increasing Step-size for Parameter Estimation in
Statistical Models
- Authors: Nhat Ho and Tongzheng Ren and Sujay Sanghavi and Purnamrita Sarkar and
Rachel Ward
- Abstract summary: We propose to exponentially increase the step-size of the Gaussian descent (GD) algorithm.
We then consider using the EGD algorithm for solving parameter estimation under non-regular statistical models.
The total computational complexity of the EGD algorithm is emphoptimal and exponentially cheaper than that of the GD for solving parameter estimation in non-regular statistical models.
- Score: 37.63410634069547
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Using gradient descent (GD) with fixed or decaying step-size is standard
practice in unconstrained optimization problems. However, when the loss
function is only locally convex, such a step-size schedule artificially slows
GD down as it cannot explore the flat curvature of the loss function. To
overcome that issue, we propose to exponentially increase the step-size of the
GD algorithm. Under homogeneous assumptions on the loss function, we
demonstrate that the iterates of the proposed \emph{exponential step size
gradient descent} (EGD) algorithm converge linearly to the optimal solution.
Leveraging that optimization insight, we then consider using the EGD algorithm
for solving parameter estimation under non-regular statistical models whose the
loss function becomes locally convex when the sample size goes to infinity. We
demonstrate that the EGD iterates reach the final statistical radius within the
true parameter after a logarithmic number of iterations, which is in stark
contrast to a \emph{polynomial} number of iterations of the GD algorithm.
Therefore, the total computational complexity of the EGD algorithm is
\emph{optimal} and exponentially cheaper than that of the GD for solving
parameter estimation in non-regular statistical models. To the best of our
knowledge, it resolves a long-standing gap between statistical and algorithmic
computational complexities of parameter estimation in non-regular statistical
models. Finally, we provide targeted applications of the general theory to
several classes of statistical models, including generalized linear models with
polynomial link functions and location Gaussian mixture models.
Related papers
- Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - Differentially Private Optimization with Sparse Gradients [60.853074897282625]
We study differentially private (DP) optimization problems under sparsity of individual gradients.
Building on this, we obtain pure- and approximate-DP algorithms with almost optimal rates for convex optimization with sparse gradients.
arXiv Detail & Related papers (2024-04-16T20:01:10Z) - Joint Graph Learning and Model Fitting in Laplacian Regularized
Stratified Models [5.933030735757292]
Laplacian regularized stratified models (LRSM) are models that utilize the explicit or implicit network structure of the sub-problems.
This paper shows the importance and sensitivity of graph weights in LRSM, and provably show that the sensitivity can be arbitrarily large.
We propose a generic approach to jointly learn the graph while fitting the model parameters by solving a single optimization problem.
arXiv Detail & Related papers (2023-05-04T06:06:29Z) - Gaussian process regression and conditional Karhunen-Lo\'{e}ve models
for data assimilation in inverse problems [68.8204255655161]
We present a model inversion algorithm, CKLEMAP, for data assimilation and parameter estimation in partial differential equation models.
The CKLEMAP method provides better scalability compared to the standard MAP method.
arXiv Detail & Related papers (2023-01-26T18:14:12Z) - Stochastic Mirror Descent for Large-Scale Sparse Recovery [13.500750042707407]
We discuss an application of quadratic Approximation to statistical estimation of high-dimensional sparse parameters.
We show that the proposed algorithm attains the optimal convergence of the estimation error under weak assumptions on the regressor distribution.
arXiv Detail & Related papers (2022-10-23T23:23:23Z) - Improving Computational Complexity in Statistical Models with
Second-Order Information [32.64382185990981]
We study the normalized gradient descent (NormGD) algorithm for solving parameter estimation in parametric statistical models.
We demonstrate that the NormGD algorithm achieves the optimal overall computational complexity $mathcalO(n)$ to reach the final statistical radius.
This computational complexity is cheaper than that of the fixed step-size gradient descent algorithm.
arXiv Detail & Related papers (2022-02-09T01:32:50Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z) - Last iterate convergence of SGD for Least-Squares in the Interpolation
regime [19.05750582096579]
We study the noiseless model in the fundamental least-squares setup.
We assume that an optimum predictor fits perfectly inputs and outputs $langle theta_*, phi(X) rangle = Y$, where $phi(X)$ stands for a possibly infinite dimensional non-linear feature map.
arXiv Detail & Related papers (2021-02-05T14:02:20Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z) - Statistical Inference for Model Parameters in Stochastic Gradient
Descent [45.29532403359099]
gradient descent coefficients (SGD) has been widely used in statistical estimation for large-scale data due to its computational and memory efficiency.
We investigate the problem of statistical inference of true model parameters based on SGD when the population loss function is strongly convex and satisfies certain conditions.
arXiv Detail & Related papers (2016-10-27T07:04:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.