Linear Asymptotic Convergence of Anderson Acceleration: Fixed-Point
Analysis
- URL: http://arxiv.org/abs/2109.14176v1
- Date: Wed, 29 Sep 2021 03:42:41 GMT
- Title: Linear Asymptotic Convergence of Anderson Acceleration: Fixed-Point
Analysis
- Authors: Hans De Sterck and Yunhui He
- Abstract summary: We study the convergence of AA($m$), i.e., Anderson acceleration with window size $m$ for accelerating fixed-point methods.
We analyze the continuity and differentiability properties of $Psi(z)$ and $beta(z)$.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the asymptotic convergence of AA($m$), i.e., Anderson acceleration
with window size $m$ for accelerating fixed-point methods $x_{k+1}=q(x_{k})$,
$x_k \in R^n$. Convergence acceleration by AA($m$) has been widely observed but
is not well understood. We consider the case where the fixed-point iteration
function $q(x)$ is differentiable and the convergence of the fixed-point method
itself is root-linear. We identify numerically several conspicuous properties
of AA($m$) convergence: First, AA($m$) sequences $\{x_k\}$ converge
root-linearly but the root-linear convergence factor depends strongly on the
initial condition. Second, the AA($m$) acceleration coefficients $\beta^{(k)}$
do not converge but oscillate as $\{x_k\}$ converges to $x^*$. To shed light on
these observations, we write the AA($m$) iteration as an augmented fixed-point
iteration $z_{k+1} =\Psi(z_k)$, $z_k \in R^{n(m+1)}$ and analyze the continuity
and differentiability properties of $\Psi(z)$ and $\beta(z)$. We find that the
vector of acceleration coefficients $\beta(z)$ is not continuous at the fixed
point $z^*$. However, we show that, despite the discontinuity of $\beta(z)$,
the iteration function $\Psi(z)$ is Lipschitz continuous and directionally
differentiable at $z^*$ for AA(1), and we generalize this to AA($m$) with $m>1$
for most cases. Furthermore, we find that $\Psi(z)$ is not differentiable at
$z^*$. We then discuss how these theoretical findings relate to the observed
convergence behaviour of AA($m$). The discontinuity of $\beta(z)$ at $z^*$
allows $\beta^{(k)}$ to oscillate as $\{x_k\}$ converges to $x^*$, and the
non-differentiability of $\Psi(z)$ allows AA($m$) sequences to converge with
root-linear convergence factors that strongly depend on the initial condition.
Additional numerical results illustrate our findings.
Related papers
- On the $O(\rac{\sqrt{d}}{K^{1/4}})$ Convergence Rate of AdamW Measured by $\ell_1$ Norm [54.28350823319057]
This paper establishes the convergence rate $frac1Ksum_k=1KEleft[|nabla f(xk)|_1right]leq O(fracsqrtdCK1/4) for AdamW measured by $ell_$ norm, where $K$ represents the iteration number, $d denotes the model dimension, and $C$ matches the constant in the optimal convergence rate of SGD.
arXiv Detail & Related papers (2025-05-17T05:02:52Z) - Variance-Dependent Regret Lower Bounds for Contextual Bandits [65.93854043353328]
Variance-dependent regret bounds for linear contextual bandits, which improve upon the classical $tildeO(dsqrtK)$ regret bound to $tildeO(dsqrtsum_k=1Ksigma_k2)$.
arXiv Detail & Related papers (2025-03-15T07:09:36Z) - Evidence for simple "arrow of time functions" in closed chaotic quantum systems [0.0]
The construction of $alphan(t)$ from $C(t)$ requires the first $2n$ temporal derivatives of $C(t)$ at times $0$ and $t$.
Our focus is on $alphan(t)$ that (almost) monotonously decrease, we call these arrows of time functions" (AOTFs)
arXiv Detail & Related papers (2024-08-15T08:17:10Z) - Revisiting Step-Size Assumptions in Stochastic Approximation [1.3654846342364308]
It is shown for the first time that this assumption is not required for convergence and finer results.<n>Rates of convergence are obtained for the standard algorithm and for estimates obtained via the averaging technique of Polyak and Ruppert.<n>Results from numerical experiments illustrate that $beta_theta$ may be large due to a combination of multiplicative noise and Markovian memory.
arXiv Detail & Related papers (2024-05-28T05:11:05Z) - Convergence of Gradient Descent with Small Initialization for
Unregularized Matrix Completion [21.846732043706318]
We show that the vanilla gradient descent provably converges to the ground truth $rmXstar$ without requiring any explicit regularization.
Surprisingly, neither the convergence rate nor the final accuracy depends on the over- parameterized search rank $r'$, and they are only governed by the true rank $r$.
arXiv Detail & Related papers (2024-02-09T19:39:23Z) - On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
This paper considers the RMSProp and its momentum extension and establishes the convergence rate of $frac1Tsum_k=1T.
Our convergence rate matches the lower bound with respect to all the coefficients except the dimension $d$.
Our convergence rate can be considered to be analogous to the $frac1Tsum_k=1T.
arXiv Detail & Related papers (2024-02-01T07:21:32Z) - How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing:
The Curses of Symmetry and Initialization [46.55524654398093]
We show how over- parameterization changes the convergence behaviors of descent.
The goal is to recover an unknown low-rank ground-rank ground-truth matrix from near-isotropic linear measurements.
We propose a novel method that only modifies one step of GD and obtains a convergence rate independent of $alpha$.
arXiv Detail & Related papers (2023-10-03T03:34:22Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - A first-order primal-dual method with adaptivity to local smoothness [64.62056765216386]
We consider the problem of finding a saddle point for the convex-concave objective $min_x max_y f(x) + langle Ax, yrangle - g*(y)$, where $f$ is a convex function with locally Lipschitz gradient and $g$ is convex and possibly non-smooth.
We propose an adaptive version of the Condat-Vu algorithm, which alternates between primal gradient steps and dual steps.
arXiv Detail & Related papers (2021-10-28T14:19:30Z) - Anderson Acceleration as a Krylov Method with Application to Asymptotic
Convergence Analysis [0.0]
Anderson acceleration is widely used for accelerating the convergence of fixed-point methods.
We study the case of linear fixed-point methods $x_k+1=q(x_k)$, $x_k.
arXiv Detail & Related papers (2021-09-29T03:53:15Z) - Spectral properties of sample covariance matrices arising from random
matrices with independent non identically distributed columns [50.053491972003656]
It was previously shown that the functionals $texttr(AR(z))$, for $R(z) = (frac1nXXT- zI_p)-1$ and $Ain mathcal M_p$ deterministic, have a standard deviation of order $O(|A|_* / sqrt n)$.
Here, we show that $|mathbb E[R(z)] - tilde R(z)|_F
arXiv Detail & Related papers (2021-09-06T14:21:43Z) - Convergence Rate of the (1+1)-Evolution Strategy with Success-Based
Step-Size Adaptation on Convex Quadratic Functions [20.666734673282498]
The (1+1)-evolution strategy (ES) with success-based step-size adaptation is analyzed on a general convex quadratic function.
The convergence rate of the (1+1)-ES is derived explicitly and rigorously on a general convex quadratic function.
arXiv Detail & Related papers (2021-03-02T09:03:44Z) - Asymptotic behaviour of learning rates in Armijo's condition [0.0]
We show that if $x_n$ converges to a non-degenerate critical point, then $delta _n$ must be bounded.
This complements the first author's results on Unbounded Backtracking GD.
arXiv Detail & Related papers (2020-07-07T16:49:25Z) - A Simple Convergence Proof of Adam and Adagrad [74.24716715922759]
We show a proof of convergence between the Adam Adagrad and $O(d(N)/st)$ algorithms.
Adam converges with the same convergence $O(d(N)/st)$ when used with the default parameters.
arXiv Detail & Related papers (2020-03-05T01:56:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.