Acceleration of the kernel herding algorithm by improved gradient
approximation
- URL: http://arxiv.org/abs/2105.07900v1
- Date: Mon, 17 May 2021 14:32:45 GMT
- Title: Acceleration of the kernel herding algorithm by improved gradient
approximation
- Authors: Kazuma Tsuji and Ken'ichiro Tanaka
- Abstract summary: kernel herding is a method used to construct quadrature formulas in a reproducing kernel Hilbert space.
We propose two improved versions of the kernel herding algorithm.
- Score: 6.802401545890963
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Kernel herding is a method used to construct quadrature formulas in a
reproducing kernel Hilbert space. Although there are some advantages of kernel
herding, such as numerical stability of quadrature and effective outputs of
nodes and weights, the convergence speed of worst-case integration error is
slow in comparison to other quadrature methods. To address this problem, we
propose two improved versions of the kernel herding algorithm. The fundamental
concept of both algorithms involves approximating negative gradients with a
positive linear combination of vertex directions. We analyzed the convergence
and validity of both algorithms theoretically; in particular, we showed that
the approximation of negative gradients directly influences the convergence
speed. In addition, we confirmed the accelerated convergence of the worst-case
integration error with respect to the number of nodes and computational time
through numerical experiments.
Related papers
- Distributed Optimization via Energy Conservation Laws in Dilated Coordinates [5.35599092568615]
This paper introduces an energy conservation approach for analyzing continuous-time dynamical systems in dilated coordinates.
convergence rates can be explicitly expressed in terms of the inverse time-dilation factor.
Its accelerated convergence behavior is benchmarked against various state-of-the-art distributed optimization algorithms on practical, large-scale problems.
arXiv Detail & Related papers (2024-09-28T08:02:43Z) - Incremental Quasi-Newton Methods with Faster Superlinear Convergence
Rates [50.36933471975506]
We consider the finite-sum optimization problem, where each component function is strongly convex and has Lipschitz continuous gradient and Hessian.
The recently proposed incremental quasi-Newton method is based on BFGS update and achieves a local superlinear convergence rate.
This paper proposes a more efficient quasi-Newton method by incorporating the symmetric rank-1 update into the incremental framework.
arXiv Detail & Related papers (2024-02-04T05:54:51Z) - A Globally Convergent Algorithm for Neural Network Parameter
Optimization Based on Difference-of-Convex Functions [29.58728073957055]
We propose an algorithm for optimizing parameters of hidden layer networks.
Specifically, we derive a blockwise (DC-of-the-art) difference function.
arXiv Detail & Related papers (2024-01-15T19:53:35Z) - Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Fast Computation of Optimal Transport via Entropy-Regularized Extragradient Methods [75.34939761152587]
Efficient computation of the optimal transport distance between two distributions serves as an algorithm that empowers various applications.
This paper develops a scalable first-order optimization-based method that computes optimal transport to within $varepsilon$ additive accuracy.
arXiv Detail & Related papers (2023-01-30T15:46:39Z) - Nesterov Accelerated Shuffling Gradient Method for Convex Optimization [15.908060383231371]
We show that our algorithm has an improved rate of $mathcalO (1/T)$ using unified shuffling schemes.
Our convergence analysis does not require an assumption on bounded domain or a bounded gradient condition.
Numerical simulations demonstrate the efficiency of our algorithm.
arXiv Detail & Related papers (2022-02-07T21:23:17Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - A Unified Analysis of First-Order Methods for Smooth Games via Integral
Quadratic Constraints [10.578409461429626]
In this work, we adapt the integral quadratic constraints theory to first-order methods for smooth and strongly-varying games and iteration.
We provide emphfor the first time a global convergence rate for the negative momentum method(NM) with an complexity $mathcalO(kappa1.5)$, which matches its known lower bound.
We show that it is impossible for an algorithm with one step of memory to achieve acceleration if it only queries the gradient once per batch.
arXiv Detail & Related papers (2020-09-23T20:02:00Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method [64.15649345392822]
We introduce a framework for designing primal methods under the decentralized optimization setting where local functions are smooth and strongly convex.
Our approach consists of approximately solving a sequence of sub-problems induced by the accelerated augmented Lagrangian method.
When coupled with accelerated gradient descent, our framework yields a novel primal algorithm whose convergence rate is optimal and matched by recently derived lower bounds.
arXiv Detail & Related papers (2020-06-11T18:49:06Z) - A Hybrid-Order Distributed SGD Method for Non-Convex Optimization to
Balance Communication Overhead, Computational Complexity, and Convergence
Rate [28.167294398293297]
We propose a method of distributed gradient descent (SGD) with low communication load and computational complexity, and still fast.
To reduce the computational complexity in each iteration, the worker nodes approximate the directional derivatives with zeroth-order gradient estimation.
arXiv Detail & Related papers (2020-03-27T14:02:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.