Improving Computational Complexity in Statistical Models with
Second-Order Information
- URL: http://arxiv.org/abs/2202.04219v2
- Date: Thu, 10 Feb 2022 03:09:31 GMT
- Title: Improving Computational Complexity in Statistical Models with
Second-Order Information
- Authors: Tongzheng Ren and Jiacheng Zhuo and Sujay Sanghavi and Nhat Ho
- Abstract summary: We study the normalized gradient descent (NormGD) algorithm for solving parameter estimation in parametric statistical models.
We demonstrate that the NormGD algorithm achieves the optimal overall computational complexity $mathcalO(n)$ to reach the final statistical radius.
This computational complexity is cheaper than that of the fixed step-size gradient descent algorithm.
- Score: 32.64382185990981
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is known that when the statistical models are singular, i.e., the Fisher
information matrix at the true parameter is degenerate, the fixed step-size
gradient descent algorithm takes polynomial number of steps in terms of the
sample size $n$ to converge to a final statistical radius around the true
parameter, which can be unsatisfactory for the application. To further improve
that computational complexity, we consider the utilization of the second-order
information in the design of optimization algorithms. Specifically, we study
the normalized gradient descent (NormGD) algorithm for solving parameter
estimation in parametric statistical models, which is a variant of gradient
descent algorithm whose step size is scaled by the maximum eigenvalue of the
Hessian matrix of the empirical loss function of statistical models. When the
population loss function, i.e., the limit of the empirical loss function when
$n$ goes to infinity, is homogeneous in all directions, we demonstrate that the
NormGD iterates reach a final statistical radius around the true parameter
after a logarithmic number of iterations in terms of $n$. Therefore, for fixed
dimension $d$, the NormGD algorithm achieves the optimal overall computational
complexity $\mathcal{O}(n)$ to reach the final statistical radius. This
computational complexity is cheaper than that of the fixed step-size gradient
descent algorithm, which is of the order $\mathcal{O}(n^{\tau})$ for some $\tau
> 1$, to reach the same statistical radius. We illustrate our general theory
under two statistical models: generalized linear models and mixture models, and
experimental results support our prediction with general theory.
Related papers
- Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure.
We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z) - Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood
Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions.
Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation.
In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z) - An Exponentially Increasing Step-size for Parameter Estimation in
Statistical Models [37.63410634069547]
We propose to exponentially increase the step-size of the Gaussian descent (GD) algorithm.
We then consider using the EGD algorithm for solving parameter estimation under non-regular statistical models.
The total computational complexity of the EGD algorithm is emphoptimal and exponentially cheaper than that of the GD for solving parameter estimation in non-regular statistical models.
arXiv Detail & Related papers (2022-05-16T21:36:22Z) - Balancing Rates and Variance via Adaptive Batch-Size for Stochastic
Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error.
Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z) - Breaking the Sample Size Barrier in Model-Based Reinforcement Learning
with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator)
We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$.
We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z) - Dual Stochastic Natural Gradient Descent and convergence of interior
half-space gradient approximations [0.0]
Multinomial logistic regression (MLR) is widely used in statistics and machine learning.
gradient descent (SGD) is the most common approach for determining the parameters of a MLR model in big data scenarios.
arXiv Detail & Related papers (2020-01-19T00:53:49Z) - Statistical Inference for Model Parameters in Stochastic Gradient
Descent [45.29532403359099]
gradient descent coefficients (SGD) has been widely used in statistical estimation for large-scale data due to its computational and memory efficiency.
We investigate the problem of statistical inference of true model parameters based on SGD when the population loss function is strongly convex and satisfies certain conditions.
arXiv Detail & Related papers (2016-10-27T07:04:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.