Breaking the Linear Iteration Cost Barrier for Some Well-known
Conditional Gradient Methods Using MaxIP Data-structures
- URL: http://arxiv.org/abs/2111.15139v1
- Date: Tue, 30 Nov 2021 05:40:14 GMT
- Title: Breaking the Linear Iteration Cost Barrier for Some Well-known
Conditional Gradient Methods Using MaxIP Data-structures
- Authors: Anshumali Shrivastava, Zhao Song and Zhaozhuo Xu
- Abstract summary: Conditional gradient methods (CGM) are widely used in modern machine learning.
Most efforts focus on reducing the number of iterations as a means to reduce the overall running time.
We show the first algorithm, where the cost per iteration is sublinear in the number of parameters, for many fundamental optimization algorithms.
- Score: 49.73889315176884
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conditional gradient methods (CGM) are widely used in modern machine
learning. CGM's overall running time usually consists of two parts: the number
of iterations and the cost of each iteration. Most efforts focus on reducing
the number of iterations as a means to reduce the overall running time. In this
work, we focus on improving the per iteration cost of CGM. The bottleneck step
in most CGM is maximum inner product search (MaxIP), which requires a linear
scan over the parameters. In practice, approximate MaxIP data-structures are
found to be helpful heuristics. However, theoretically, nothing is known about
the combination of approximate MaxIP data-structures and CGM. In this work, we
answer this question positively by providing a formal framework to combine the
locality sensitive hashing type approximate MaxIP data-structures with CGM
algorithms. As a result, we show the first algorithm, where the cost per
iteration is sublinear in the number of parameters, for many fundamental
optimization algorithms, e.g., Frank-Wolfe, Herding algorithm, and policy
gradient.
Related papers
- Approximating Metric Magnitude of Point Sets [4.522729058300309]
Metric magnitude is a measure of the "size" of point clouds with many desirable geometric properties.
It has been adapted to various mathematical contexts and recent work suggests that it can enhance machine learning and optimization algorithms.
In this paper, we study the magnitude problem, and show efficient ways of approximating it. We show that it can be cast as a convex optimization problem, but not as a submodular optimization.
The paper describes two new algorithms - an iterative approximation algorithm that converges fast and is accurate, and a subset selection method that makes the computation even faster.
arXiv Detail & Related papers (2024-09-06T17:15:28Z) - A General Online Algorithm for Optimizing Complex Performance Metrics [5.726378955570775]
We introduce and analyze a general online algorithm that can be used in a straightforward way with a variety of complex performance metrics in binary, multi-class, and multi-label classification problems.
The algorithm's update and prediction rules are appealingly simple and computationally efficient without the need to store any past data.
arXiv Detail & Related papers (2024-06-20T21:24:47Z) - Accelerating Cutting-Plane Algorithms via Reinforcement Learning
Surrogates [49.84541884653309]
A current standard approach to solving convex discrete optimization problems is the use of cutting-plane algorithms.
Despite the existence of a number of general-purpose cut-generating algorithms, large-scale discrete optimization problems continue to suffer from intractability.
We propose a method for accelerating cutting-plane algorithms via reinforcement learning.
arXiv Detail & Related papers (2023-07-17T20:11:56Z) - Improving Time and Memory Efficiency of Genetic Algorithms by Storing
Populations as Minimum Spanning Trees of Patches [0.0]
In evolutionary algorithms the computational cost of applying operators and storing populations is comparable to the cost of fitness evaluation.
We show that storing the population as a minimum spanning tree, where vertices correspond to individuals but only contain meta-information about them, is a viable alternative to the straightforward implementation.
Our experiments suggest that significant, even improvements -- including execution of crossover operators! -- can be achieved in terms of both memory usage and computational costs.
arXiv Detail & Related papers (2023-06-29T05:12:14Z) - Efficient Approximate Kernel Based Spike Sequence Classification [56.2938724367661]
Machine learning models, such as SVM, require a definition of distance/similarity between pairs of sequences.
Exact methods yield better classification performance, but they pose high computational costs.
We propose a series of ways to improve the performance of the approximate kernel in order to enhance its predictive performance.
arXiv Detail & Related papers (2022-09-11T22:44:19Z) - Alternating Mahalanobis Distance Minimization for Stable and Accurate CP
Decomposition [4.847980206213335]
We introduce a new formulation for deriving singular values and vectors of a tensor by considering the critical points of a function different from what is used in the previous work.
We show that a subsweep of this algorithm can achieve a superlinear convergence rate for exact CPD with known rank.
We then view the algorithm as optimizing a Mahalanobis distance with respect to each factor with ground metric dependent on the other factors.
arXiv Detail & Related papers (2022-04-14T19:56:36Z) - Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic Algorithm for Constrained Markov Decision Processes [13.908826484332282]
We study an online primal-dual actor-critic method to solve a discounted cost constrained Markov decision process problem.
This paper is the first to study the finite-time complexity of an online primal-dual actor-critic method for solving a CMDP problem.
arXiv Detail & Related papers (2021-10-21T18:05:40Z) - Provably Faster Algorithms for Bilevel Optimization [54.83583213812667]
Bilevel optimization has been widely applied in many important machine learning applications.
We propose two new algorithms for bilevel optimization.
We show that both algorithms achieve the complexity of $mathcalO(epsilon-1.5)$, which outperforms all existing algorithms by the order of magnitude.
arXiv Detail & Related papers (2021-06-08T21:05:30Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - Refined bounds for algorithm configuration: The knife-edge of dual class
approximability [94.83809668933021]
We investigate how large should a training set be to ensure that a parameter's average metrics performance over the training set is close to its expected, future performance.
We show that if this approximation holds under the L-infinity norm, we can provide strong sample complexity bounds.
We empirically evaluate our bounds in the context of integer programming, one of the most powerful tools in computer science.
arXiv Detail & Related papers (2020-06-21T15:32:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.