Related papers: Improving Computational Complexity in Statistical Models with Second-Order Information

Improving Computational Complexity in Statistical Models with Second-Order Information

URL: http://arxiv.org/abs/2202.04219v2
Date: Thu, 10 Feb 2022 03:09:31 GMT
Title: Improving Computational Complexity in Statistical Models with Second-Order Information
Authors: Tongzheng Ren and Jiacheng Zhuo and Sujay Sanghavi and Nhat Ho
Abstract summary: We study the normalized gradient descent (NormGD) algorithm for solving parameter estimation in parametric statistical models. We demonstrate that the NormGD algorithm achieves the optimal overall computational complexity $mathcalO(n)$ to reach the final statistical radius. This computational complexity is cheaper than that of the fixed step-size gradient descent algorithm.
Score: 32.64382185990981
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: It is known that when the statistical models are singular, i.e., the Fisher information matrix at the true parameter is degenerate, the fixed step-size gradient descent algorithm takes polynomial number of steps in terms of the sample size $n$ to converge to a final statistical radius around the true parameter, which can be unsatisfactory for the application. To further improve that computational complexity, we consider the utilization of the second-order information in the design of optimization algorithms. Specifically, we study the normalized gradient descent (NormGD) algorithm for solving parameter estimation in parametric statistical models, which is a variant of gradient descent algorithm whose step size is scaled by the maximum eigenvalue of the Hessian matrix of the empirical loss function of statistical models. When the population loss function, i.e., the limit of the empirical loss function when $n$ goes to infinity, is homogeneous in all directions, we demonstrate that the NormGD iterates reach a final statistical radius around the true parameter after a logarithmic number of iterations in terms of $n$. Therefore, for fixed dimension $d$, the NormGD algorithm achieves the optimal overall computational complexity $\mathcal{O}(n)$ to reach the final statistical radius. This computational complexity is cheaper than that of the fixed step-size gradient descent algorithm, which is of the order $\mathcal{O}(n^{\tau})$ for some $\tau > 1$, to reach the same statistical radius. We illustrate our general theory under two statistical models: generalized linear models and mixture models, and experimental results support our prediction with general theory.

Related papers

Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure. We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z)
Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions. Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation. In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z)
An Exponentially Increasing Step-size for Parameter Estimation in Statistical Models [37.63410634069547]
We propose to exponentially increase the step-size of the Gaussian descent (GD) algorithm. We then consider using the EGD algorithm for solving parameter estimation under non-regular statistical models. The total computational complexity of the EGD algorithm is emphoptimal and exponentially cheaper than that of the GD for solving parameter estimation in non-regular statistical models.
arXiv Detail & Related papers (2022-05-16T21:36:22Z)
Balancing Rates and Variance via Adaptive Batch-Size for Stochastic Optimization Problems [120.21685755278509]
In this work, we seek to balance the fact that attenuating step-size is required for exact convergence with the fact that constant step-size learns faster in time up to an error. Rather than fixing the minibatch the step-size at the outset, we propose to allow parameters to evolve adaptively.
arXiv Detail & Related papers (2020-07-02T16:02:02Z)
Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator) We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$. We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z)
Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples. We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z)
Dual Stochastic Natural Gradient Descent and convergence of interior half-space gradient approximations [0.0]
Multinomial logistic regression (MLR) is widely used in statistics and machine learning. gradient descent (SGD) is the most common approach for determining the parameters of a MLR model in big data scenarios.
arXiv Detail & Related papers (2020-01-19T00:53:49Z)
Statistical Inference for Model Parameters in Stochastic Gradient Descent [45.29532403359099]
gradient descent coefficients (SGD) has been widely used in statistical estimation for large-scale data due to its computational and memory efficiency. We investigate the problem of statistical inference of true model parameters based on SGD when the population loss function is strongly convex and satisfies certain conditions.
arXiv Detail & Related papers (2016-10-27T07:04:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.