Coordinate-wise Armijo's condition: General case
        - URL: http://arxiv.org/abs/2003.05252v1
- Date: Wed, 11 Mar 2020 12:17:05 GMT
- Title: Coordinate-wise Armijo's condition: General case
- Authors: Tuyen Trung Truong
- Abstract summary: We prove convergent results for some functions such as $f(x,y)=f(x,y)+g(y)$.
We then analyse and present experimental results for some functions such as $f(x,y)=f(x,y)+g(y)$.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Let $z=(x,y)$ be coordinates for the product space $\mathbb{R}^{m_1}\times
\mathbb{R}^{m_2}$. Let $f:\mathbb{R}^{m_1}\times \mathbb{R}^{m_2}\rightarrow
\mathbb{R}$ be a $C^1$ function, and $\nabla f=(\partial _xf,\partial _yf)$ its
gradient. Fix $0<\alpha <1$. For a point $(x,y) \in \mathbb{R}^{m_1}\times
\mathbb{R}^{m_2}$, a number $\delta >0$ satisfies Armijo's condition at $(x,y)$
if the following inequality holds: \begin{eqnarray*} f(x-\delta \partial
_xf,y-\delta \partial _yf)-f(x,y)\leq -\alpha \delta (||\partial
_xf||^2+||\partial _yf||^2). \end{eqnarray*}
  In one previous paper, we proposed the following {\bf coordinate-wise}
Armijo's condition. Fix again $0<\alpha <1$. A pair of positive numbers $\delta
_1,\delta _2>0$ satisfies the coordinate-wise variant of Armijo's condition at
$(x,y)$ if the following inequality holds: \begin{eqnarray*} [f(x-\delta
_1\partial _xf(x,y), y-\delta _2\partial _y f(x,y))]-[f(x,y)]\leq -\alpha
(\delta _1||\partial _xf(x,y)||^2+\delta _2||\partial _yf(x,y)||^2).
\end{eqnarray*} Previously we applied this condition for functions of the form
$f(x,y)=f(x)+g(y)$, and proved various convergent results for them. For a
general function, it is crucial - for being able to do real computations - to
have a systematic algorithm for obtaining $\delta _1$ and $\delta _2$
satisfying the coordinate-wise version of Armijo's condition, much like
Backtracking for the usual Armijo's condition. In this paper we propose such an
algorithm, and prove according convergent results.
  We then analyse and present experimental results for some functions such as
$f(x,y)=a|x|+y$ (given by Asl and Overton in connection to Wolfe's method),
$f(x,y)=x^3 sin (1/x) + y^3 sin(1/y)$ and Rosenbrock's function.
 
      
        Related papers
        - The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
 We show that this problem has randomized communication complexity $Omega(frac1kcdot n2log|mathbbF|)$.
As an application, we obtain an $Omega(frac1kcdot n2log|mathbbF|)$ space lower bound for any streaming algorithm with $k$ passes.
 arXiv  Detail & Related papers  (2024-10-26T06:21:42Z)
- Efficient Continual Finite-Sum Minimization [52.5238287567572]
 We propose a key twist into the finite-sum minimization, dubbed as continual finite-sum minimization.
Our approach significantly improves upon the $mathcalO(n/epsilon)$ FOs that $mathrmStochasticGradientDescent$ requires.
We also prove that there is no natural first-order method with $mathcalOleft(n/epsilonalpharight)$ complexity gradient for $alpha  1/4$, establishing that the first-order complexity of our method is nearly tight.
 arXiv  Detail & Related papers  (2024-06-07T08:26:31Z)
- Online Learning of Smooth Functions [0.35534933448684125]
 We study the online learning of real-valued functions where the hidden function is known to have certain smoothness properties.
We find new bounds for $textopt_p(mathcal F_q)$ that are sharp up to a constant factor.
In the multi-variable setup, we establish inequalities relating $textopt_p(mathcal F_q,d)$ to $textopt_p(mathcal F_q,d)$ and show that $textopt_p(mathcal F
 arXiv  Detail & Related papers  (2023-01-04T04:05:58Z)
- Convergence Rates of Stochastic Zeroth-order Gradient Descent for \L
  ojasiewicz Functions [6.137707924685666]
 We prove convergence rates of Zeroth-order Gradient Descent (SZGD) algorithms for Lojasiewicz functions.
Our results show that $  f (mathbfx_t) - f (mathbfx_infty) _t in mathbbN $ can converge faster than $ | mathbfx_infty.
 arXiv  Detail & Related papers  (2022-10-31T00:53:17Z)
- On Outer Bi-Lipschitz Extensions of Linear Johnson-Lindenstrauss
  Embeddings of Low-Dimensional Submanifolds of $\mathbb{R}^N$ [0.24366811507669117]
 Let $mathcalM$ be a compact $d$-dimensional submanifold of $mathbbRN$ with reach $tau$ and volume $V_mathcal M$.
We prove that a nonlinear function $f: mathbbRN rightarrow mathbbRmm exists with $m leq C left(d / epsilon2right) log left(fracsqrt[d]V_math
 arXiv  Detail & Related papers  (2022-06-07T15:10:46Z)
- Low-Rank Approximation with $1/\epsilon^{1/3}$ Matrix-Vector Products [58.05771390012827]
 We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-$p$ norm.
Our main result is an algorithm that uses only $tildeO(k/sqrtepsilon)$ matrix-vector products.
 arXiv  Detail & Related papers  (2022-02-10T16:10:41Z)
- On the Self-Penalization Phenomenon in Feature Selection [69.16452769334367]
 We describe an implicit sparsity-inducing mechanism based on over a family of kernels.
As an application, we use this sparsity-inducing mechanism to build algorithms consistent for feature selection.
 arXiv  Detail & Related papers  (2021-10-12T09:36:41Z)
- Generalisations and improvements of New Q-Newton's method Backtracking [0.0]
 We propose a general framework for the algorithm New Q-Newton's method Backtracking.
In this paper, we allow more flexibility and gradientity, for example $1$ or $e_m(x)$'s are not necessarily eigenvectors of $nabla 2f(x)$.
 arXiv  Detail & Related papers  (2021-09-23T14:28:15Z)
- Learning low-degree functions from a logarithmic number of random
  queries [77.34726150561087]
 We prove that for any integer $ninmathbbN$, $din1,ldots,n$ and any $varepsilon,deltain(0,1)$, a bounded function $f:-1,1nto[-1,1]$ of degree at most $d$ can be learned.
 arXiv  Detail & Related papers  (2021-09-21T13:19:04Z)
- Linear Bandits on Uniformly Convex Sets [88.3673525964507]
 Linear bandit algorithms yield $tildemathcalO(nsqrtT)$ pseudo-regret bounds on compact convex action sets.
Two types of structural assumptions lead to better pseudo-regret bounds.
 arXiv  Detail & Related papers  (2021-03-10T07:33:03Z)
- Optimal Mean Estimation without a Variance [103.26777953032537]
 We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist.
We design an estimator which attains the smallest possible confidence interval as a function of $n,d,delta$.
 arXiv  Detail & Related papers  (2020-11-24T22:39:21Z)
- Asymptotic behaviour of learning rates in Armijo's condition [0.0]
 We show that if $x_n$ converges to a non-degenerate critical point, then $delta _n$ must be bounded.
This complements the first author's results on Unbounded Backtracking GD.
 arXiv  Detail & Related papers  (2020-07-07T16:49:25Z)
- Some convergent results for Backtracking Gradient Descent method on
  Banach spaces [0.0]
 bf Theorem. Let $X$ be a Banach space and $f:Xrightarrow mathbbR$ be a $C2$ function.
Let $mathcalC$ be the set of critical points of $f$.
 arXiv  Detail & Related papers  (2020-01-16T12:49:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.