Generalisations and improvements of New Q-Newton's method Backtracking
- URL: http://arxiv.org/abs/2109.11395v1
- Date: Thu, 23 Sep 2021 14:28:15 GMT
- Title: Generalisations and improvements of New Q-Newton's method Backtracking
- Authors: Tuyen Trung Truong
- Abstract summary: We propose a general framework for the algorithm New Q-Newton's method Backtracking.
In this paper, we allow more flexibility and gradientity, for example $1$ or $e_m(x)$'s are not necessarily eigenvectors of $nabla 2f(x)$.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a general framework for the algorithm New
Q-Newton's method Backtracking, developed in the author's previous work. For a
symmetric, square real matrix $A$, we define $minsp(A):=\min _{||e||=1}
||Ae||$. Given a $C^2$ cost function $f:\mathbb{R}^m\rightarrow \mathbb{R}$ and
a real number $0<\tau $, as well as $m+1$ fixed real numbers $\delta _0,\ldots
,\delta _m$, we define for each $x\in \mathbb{R}^m$ with $\nabla f(x)\not= 0$
the following quantities:
$\kappa :=\min _{i\not= j}|\delta _i-\delta _j|$;
$A(x):=\nabla ^2f(x)+\delta ||\nabla f(x)||^{\tau}Id$, where $\delta$ is the
first element in the sequence $\{\delta _0,\ldots ,\delta _m\}$ for which
$minsp(A(x))\geq \kappa ||\nabla f(x)||^{\tau}$;
$e_1(x),\ldots ,e_m(x)$ are an orthonormal basis of $\mathbb{R}^m$, chosen
appropriately;
$w(x)=$ the step direction, given by the formula: $$w(x)=\sum
_{i=1}^m\frac{<\nabla f(x),e_i(x)>}{||A(x)e_i(x)||}e_i(x);$$ (we can also
normalise by $w(x)/\max \{1,||w(x)||\}$ when needed)
$\gamma (x)>0$ learning rate chosen by Backtracking line search so that
Armijo's condition is satisfied: $$f(x-\gamma (x)w(x))-f(x)\leq
-\frac{1}{3}\gamma (x)<\nabla f(x),w(x)>.$$
The update rule for our algorithm is $x\mapsto H(x)=x-\gamma (x)w(x)$.
In New Q-Newton's method Backtracking, the choices are $\tau =1+\alpha >1$
and $e_1(x),\ldots ,e_m(x)$'s are eigenvectors of $\nabla ^2f(x)$. In this
paper, we allow more flexibility and generality, for example $\tau$ can be
chosen to be $<1$ or $e_1(x),\ldots ,e_m(x)$'s are not necessarily eigenvectors
of $\nabla ^2f(x)$.
New Q-Newton's method Backtracking (as well as Backtracking gradient descent)
is a special case, and some versions have flavours of quasi-Newton's methods.
Several versions allow good theoretical guarantees. An application to solving
systems of polynomial equations is given.
Related papers
- Efficient Continual Finite-Sum Minimization [52.5238287567572]
We propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization.
Our approach significantly improves upon the $mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$ requires.
We also prove that there is no natural first-order method with $mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$, establishing that the first-order complexity of our method is nearly tight.
arXiv Detail & Related papers (2024-06-07T08:26:31Z) - Distribution-Independent Regression for Generalized Linear Models with
Oblivious Corruptions [49.69852011882769]
We show the first algorithms for the problem of regression for generalized linear models (GLMs) in the presence of additive oblivious noise.
We present an algorithm that tackles newthis problem in its most general distribution-independent setting.
This is the first newalgorithmic result for GLM regression newwith oblivious noise which can handle more than half the samples being arbitrarily corrupted.
arXiv Detail & Related papers (2023-09-20T21:41:59Z) - An Over-parameterized Exponential Regression [18.57735939471469]
Recent developments in the field of Large Language Models (LLMs) have sparked interest in the use of exponential activation functions.
We define the neural function $F: mathbbRd times m times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRd
arXiv Detail & Related papers (2023-03-29T07:29:07Z) - A Nearly-Optimal Bound for Fast Regression with $\ell_\infty$ Guarantee [16.409210914237086]
Given a matrix $Ain mathbbRntimes d$ and a tensor $bin mathbbRn$, we consider the regression problem with $ell_infty$ guarantees.
We show that in order to obtain such $ell_infty$ guarantee for $ell$ regression, one has to use sketching matrices that are dense.
We also develop a novel analytical framework for $ell_infty$ guarantee regression that utilizes the Oblivious Coordinate-wise Embedding (OCE) property
arXiv Detail & Related papers (2023-02-01T05:22:40Z) - An Optimal Algorithm for Strongly Convex Min-min Optimization [79.11017157526815]
Existing optimal first-order methods require $mathcalO(sqrtmaxkappa_x,kappa_y log 1/epsilon)$ of computations of both $nabla_x f(x,y)$ and $nabla_y f(x,y)$.
We propose a new algorithm that only requires $mathcalO(sqrtkappa_x log 1/epsilon)$ of computations of $nabla_x f(x,
arXiv Detail & Related papers (2022-12-29T19:26:12Z) - Algorithms for Discrepancy, Matchings, and Approximations: Fast, Simple,
and Practical [1.2183405753834562]
Given a finite set system $(X,mathcal S)$, the emphdiscrepancy of a two-coloring $chi:Xto-1,1$ is defined as $max_S in mathcal S|chi(S)|$.
We propose a randomized algorithm which, for any $d>0$ and $(X,mathcal S)$ with dual shatter function $pi*(k)=O(kd)$, returns a coloring with expected discrepancy
arXiv Detail & Related papers (2022-09-02T15:59:09Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Low-Rank Approximation with $1/\epsilon^{1/3}$ Matrix-Vector Products [58.05771390012827]
We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-$p$ norm.
Our main result is an algorithm that uses only $tildeO(k/sqrtepsilon)$ matrix-vector products.
arXiv Detail & Related papers (2022-02-10T16:10:41Z) - On the Self-Penalization Phenomenon in Feature Selection [69.16452769334367]
We describe an implicit sparsity-inducing mechanism based on over a family of kernels.
As an application, we use this sparsity-inducing mechanism to build algorithms consistent for feature selection.
arXiv Detail & Related papers (2021-10-12T09:36:41Z) - Spiked Covariance Estimation from Modulo-Reduced Measurements [14.569322713960494]
We develop and analyze an algorithm that, for most directions $bfu$ and $nu=mathrmpoly(k)$, estimates $bfu$ to high accuracy using $n=mathrmpoly(k)$ measurements.
Numerical experiments show that the developed algorithm performs well even in a non-asymptotic setting.
arXiv Detail & Related papers (2021-10-04T02:10:47Z) - A fast and simple modification of Newton's method helping to avoid
saddle points [0.0]
This paper roughly says that if $f$ is $C3$ and a sequence $x_n$, then the limit point is a critical point and is not a saddle point, and the convergence rate is the same as that of Newton's method.
arXiv Detail & Related papers (2020-06-02T10:38:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.