The Role of Momentum Parameters in the Optimal Convergence of Adaptive
Polyak's Heavy-ball Methods
- URL: http://arxiv.org/abs/2102.07314v1
- Date: Mon, 15 Feb 2021 02:57:14 GMT
- Title: The Role of Momentum Parameters in the Optimal Convergence of Adaptive
Polyak's Heavy-ball Methods
- Authors: Wei Tao, Sheng Long, Gaowei Wu, Qing Tao
- Abstract summary: We prove that the adaptive Polyak's Heavy-ball (HB) method attains an optimal individual convergence rate of $O(frac1sqrtt)$.
Our new analysis shows how the HB momentum and its time-varying weight help us to achieve the acceleration in convex optimization.
- Score: 12.93796690939018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The adaptive stochastic gradient descent (SGD) with momentum has been widely
adopted in deep learning as well as convex optimization. In practice, the last
iterate is commonly used as the final solution to make decisions. However, the
available regret analysis and the setting of constant momentum parameters only
guarantee the optimal convergence of the averaged solution. In this paper, we
fill this theory-practice gap by investigating the convergence of the last
iterate (referred to as individual convergence), which is a more difficult task
than convergence analysis of the averaged solution. Specifically, in the
constrained convex cases, we prove that the adaptive Polyak's Heavy-ball (HB)
method, in which only the step size is updated using the exponential moving
average strategy, attains an optimal individual convergence rate of
$O(\frac{1}{\sqrt{t}})$, as opposed to the optimality of $O(\frac{\log t}{\sqrt
{t}})$ of SGD, where $t$ is the number of iterations. Our new analysis not only
shows how the HB momentum and its time-varying weight help us to achieve the
acceleration in convex optimization but also gives valuable hints how the
momentum parameters should be scheduled in deep learning. Empirical results on
optimizing convex functions and training deep networks validate the correctness
of our convergence analysis and demonstrate the improved performance of the
adaptive HB methods.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.