Modified Gauss-Newton Algorithms under Noise
- URL: http://arxiv.org/abs/2305.10634v1
- Date: Thu, 18 May 2023 01:10:42 GMT
- Title: Modified Gauss-Newton Algorithms under Noise
- Authors: Krishna Pillutla, Vincent Roulet, Sham Kakade, Zaid Harchaoui
- Abstract summary: modified Gauss-Newton or proxlinear algorithms can lead to contrasting outcomes when compared to gradient descent in large-scale statistical settings.
We explore the contrasting performance of these two classes of algorithms in theory on a stylized statistical example, and experimentally on learning problems including structured prediction.
- Score: 2.0454959820861727
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gauss-Newton methods and their stochastic version have been widely used in
machine learning and signal processing. Their nonsmooth counterparts, modified
Gauss-Newton or prox-linear algorithms, can lead to contrasting outcomes when
compared to gradient descent in large-scale statistical settings. We explore
the contrasting performance of these two classes of algorithms in theory on a
stylized statistical example, and experimentally on learning problems including
structured prediction. In theory, we delineate the regime where the quadratic
convergence of the modified Gauss-Newton method is active under statistical
noise. In the experiments, we underline the versatility of stochastic
(sub)-gradient descent to minimize nonsmooth composite objectives.
Related papers
- Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings.
We exploit a warm-start strategy to amortize the estimation of the exact gradient.
By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z) - Bayes-Newton Methods for Approximate Bayesian Inference with PSD
Guarantees [18.419390913544504]
This viewpoint explicitly casts inference algorithms under the framework of numerical optimisation.
We show that common approximations to Newton's method from the optimisation literature are still valid under this 'Bayes-Newton' framework.
Our unifying viewpoint provides new insights into the connections between various inference schemes.
arXiv Detail & Related papers (2021-11-02T16:39:29Z) - Adaptive Sampling Quasi-Newton Methods for Zeroth-Order Stochastic
Optimization [1.7513645771137178]
We consider unconstrained optimization problems with no available gradient information.
We propose an adaptive sampling quasi-Newton method where we estimate the gradients of a simulation function using finite differences within a common random number framework.
We develop modified versions of a norm test and an inner product quasi-Newton test to control the sample sizes used in the approximations and provide global convergence results to the neighborhood of the optimal solution.
arXiv Detail & Related papers (2021-09-24T21:49:25Z) - SMG: A Shuffling Gradient-Based Method with Momentum [25.389545522794172]
We combine two advanced ideas widely used in optimization for machine learning.
We develop a novel shuffling-based momentum technique.
Our tests have shown encouraging performance of the new algorithms.
arXiv Detail & Related papers (2020-11-24T04:12:35Z) - Learning Mixtures of Low-Rank Models [89.39877968115833]
We study the problem of learning computational mixtures of low-rank models.
We develop an algorithm that is guaranteed to recover the unknown matrices with near-optimal sample.
In addition, the proposed algorithm is provably stable against random noise.
arXiv Detail & Related papers (2020-09-23T17:53:48Z) - Disentangling the Gauss-Newton Method and Approximate Inference for
Neural Networks [96.87076679064499]
We disentangle the generalized Gauss-Newton and approximate inference for Bayesian deep learning.
We find that the Gauss-Newton method simplifies the underlying probabilistic model significantly.
The connection to Gaussian processes enables new function-space inference algorithms.
arXiv Detail & Related papers (2020-07-21T17:42:58Z) - Instability, Computational Efficiency and Statistical Accuracy [101.32305022521024]
We develop a framework that yields statistical accuracy based on interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (instability) when applied to an empirical object based on $n$ samples.
We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models.
arXiv Detail & Related papers (2020-05-22T22:30:52Z) - Deep Neural Network Learning with Second-Order Optimizers -- a Practical
Study with a Stochastic Quasi-Gauss-Newton Method [0.0]
We introduce and study a second-order quasi-Gauss-Newton (SQGN) optimization method that combines ideas from quasi-Newton methods, Gauss-Newton methods, and variance reduction to address this problem.
We discuss the implementation of SQGN with benchmark, and we compare its convergence and computational performance to selected first-order methods.
arXiv Detail & Related papers (2020-04-06T23:41:41Z) - Learning Gaussian Graphical Models via Multiplicative Weights [54.252053139374205]
We adapt an algorithm of Klivans and Meka based on the method of multiplicative weight updates.
The algorithm enjoys a sample complexity bound that is qualitatively similar to others in the literature.
It has a low runtime $O(mp2)$ in the case of $m$ samples and $p$ nodes, and can trivially be implemented in an online manner.
arXiv Detail & Related papers (2020-02-20T10:50:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.