Some convergent results for Backtracking Gradient Descent method on
Banach spaces
- URL: http://arxiv.org/abs/2001.05768v2
- Date: Wed, 22 Jan 2020 13:40:10 GMT
- Title: Some convergent results for Backtracking Gradient Descent method on
Banach spaces
- Authors: Tuyen Trung Truong
- Abstract summary: bf Theorem. Let $X$ be a Banach space and $f:Xrightarrow mathbbR$ be a $C2$ function.
Let $mathcalC$ be the set of critical points of $f$.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Our main result concerns the following condition:
{\bf Condition C.} Let $X$ be a Banach space. A $C^1$ function
$f:X\rightarrow \mathbb{R}$ satisfies Condition C if whenever $\{x_n\}$ weakly
converges to $x$ and $\lim _{n\rightarrow\infty}||\nabla f(x_n)||=0$, then
$\nabla f(x)=0$.
We assume that there is given a canonical isomorphism between $X$ and its
dual $X^*$, for example when $X$ is a Hilbert space.
{\bf Theorem.} Let $X$ be a reflexive, complete Banach space and
$f:X\rightarrow \mathbb{R}$ be a $C^2$ function which satisfies Condition C.
Moreover, we assume that for every bounded set $S\subset X$, then $\sup _{x\in
S}||\nabla ^2f(x)||<\infty$. We choose a random point $x_0\in X$ and construct
by the Local Backtracking GD procedure (which depends on $3$ hyper-parameters
$\alpha ,\beta ,\delta _0$, see later for details) the sequence
$x_{n+1}=x_n-\delta (x_n)\nabla f(x_n)$. Then we have:
1) Every cluster point of $\{x_n\}$, in the {\bf weak} topology, is a
critical point of $f$.
2) Either $\lim _{n\rightarrow\infty}f(x_n)=-\infty$ or $\lim
_{n\rightarrow\infty}||x_{n+1}-x_n||=0$.
3) Here we work with the weak topology. Let $\mathcal{C}$ be the set of
critical points of $f$. Assume that $\mathcal{C}$ has a bounded component $A$.
Let $\mathcal{B}$ be the set of cluster points of $\{x_n\}$. If
$\mathcal{B}\cap A\not= \emptyset$, then $\mathcal{B}\subset A$ and
$\mathcal{B}$ is connected.
4) Assume that $X$ is separable. Then for generic choices of $\alpha ,\beta
,\delta _0$ and the initial point $x_0$, if the sequence $\{x_n\}$ converges -
in the {\bf weak} topology, then the limit point cannot be a saddle point.
Related papers
- Sparsifying Suprema of Gaussian Processes [6.638504164134713]
We show that there is an $O_varepsilon(1)$-size subset $S subseteq T$ and a set of real values $c_s_s in S$.
We also use our sparsification result for suprema of centered Gaussian processes to give a sparsification lemma for convex sets of bounded geometric width.
arXiv Detail & Related papers (2024-11-22T01:43:58Z) - The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
We show that this problem has randomized communication complexity $Omega(frac1kcdot n2log|mathbbF|)$.
As an application, we obtain an $Omega(frac1kcdot n2log|mathbbF|)$ space lower bound for any streaming algorithm with $k$ passes.
arXiv Detail & Related papers (2024-10-26T06:21:42Z) - Efficient Continual Finite-Sum Minimization [52.5238287567572]
We propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization.
Our approach significantly improves upon the $mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$ requires.
We also prove that there is no natural first-order method with $mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha 1/4$, establishing that the first-order complexity of our method is nearly tight.
arXiv Detail & Related papers (2024-06-07T08:26:31Z) - Online Learning of Smooth Functions [0.35534933448684125]
We study the online learning of real-valued functions where the hidden function is known to have certain smoothness properties.
We find new bounds for $textopt_p(mathcal F_q)$ that are sharp up to a constant factor.
In the multi-variable setup, we establish inequalities relating $textopt_p(mathcal F_q,d)$ to $textopt_p(mathcal F_q,d)$ and show that $textopt_p(mathcal F
arXiv Detail & Related papers (2023-01-04T04:05:58Z) - Low-Rank Approximation with $1/\epsilon^{1/3}$ Matrix-Vector Products [58.05771390012827]
We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-$p$ norm.
Our main result is an algorithm that uses only $tildeO(k/sqrtepsilon)$ matrix-vector products.
arXiv Detail & Related papers (2022-02-10T16:10:41Z) - Learning low-degree functions from a logarithmic number of random
queries [77.34726150561087]
We prove that for any integer $ninmathbbN$, $din1,ldots,n$ and any $varepsilon,deltain(0,1)$, a bounded function $f:-1,1nto[-1,1]$ of degree at most $d$ can be learned.
arXiv Detail & Related papers (2021-09-21T13:19:04Z) - Approximate Maximum Halfspace Discrepancy [6.35821487778241]
We consider the range space $(X, mathcalH_d)$ where $X subset mathbbRd$ and $mathcalH_d$ is the set of ranges defined by $d$ halfspaces.
For each halfspace $h in mathcalH_d$ define a function $Phi(h)$ that measures the "difference" between the fraction of red and fraction of blue points which fall in the range $h$.
arXiv Detail & Related papers (2021-06-25T19:14:45Z) - The planted matching problem: Sharp threshold and infinite-order phase
transition [25.41713098167692]
We study the problem of reconstructing a perfect matching $M*$ hidden in a randomly weighted $ntimes n$ bipartite graph.
We show that if $sqrtd B(mathcalP,mathcalQ) ge 1+epsilon$ for an arbitrarily small constant $epsilon>0$, the reconstruction error for any estimator is shown to be bounded away from $0$.
arXiv Detail & Related papers (2021-03-17T00:59:33Z) - Linear Bandits on Uniformly Convex Sets [88.3673525964507]
Linear bandit algorithms yield $tildemathcalO(nsqrtT)$ pseudo-regret bounds on compact convex action sets.
Two types of structural assumptions lead to better pseudo-regret bounds.
arXiv Detail & Related papers (2021-03-10T07:33:03Z) - On the Complexity of Minimizing Convex Finite Sums Without Using the
Indices of the Individual Functions [62.01594253618911]
We exploit the finite noise structure of finite sums to derive a matching $O(n2)$-upper bound under the global oracle model.
Following a similar approach, we propose a novel adaptation of SVRG which is both emphcompatible with oracles, and achieves complexity bounds of $tildeO(n2+nsqrtL/mu)log (1/epsilon)$ and $O(nsqrtL/epsilon)$, for $mu>0$ and $mu=0$
arXiv Detail & Related papers (2020-02-09T03:39:46Z) - Backtracking Gradient Descent allowing unbounded learning rates [0.0]
In unconstrained optimisation on an Euclidean space, to prove convergence in Gradient Descent processes (GD) $x_n+1=x_n-delta _n nabla f(x_n)$ it usually is required that the learning rates $delta _n$'s are bounded.
In this paper, we allow the learning rates $delta _n$ to be unbounded.
It will be shown that this growth rate of $h$ is best possible if one wants convergence of the sequence $x_n$.
arXiv Detail & Related papers (2020-01-07T12:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.