Faster Algorithms for Structured Linear and Kernel Support Vector   Machines
        - URL: http://arxiv.org/abs/2307.07735v3
- Date: Tue, 11 Feb 2025 21:37:03 GMT
- Title: Faster Algorithms for Structured Linear and Kernel Support Vector   Machines
- Authors: Yuzhou Gu, Zhao Song, Lichen Zhang, 
- Abstract summary: We design the first nearly-linear time algorithm for solving quadratic programs when the quadratic objective admits a low-rank factorization.<n>We prove that when the squared dataset radius is at least $Omega(log2 n)$, then $Omega(n2-o(1))$ time is required.
- Score: 9.62296035593105
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract:   Quadratic programming is a ubiquitous prototype in convex programming. Many machine learning problems can be formulated as quadratic programming, including the famous Support Vector Machines (SVMs). Linear and kernel SVMs have been among the most popular models in machine learning over the past three decades, prior to the deep learning era.   Generally, a quadratic program has an input size of $\Theta(n^2)$, where $n$ is the number of variables. Assuming the Strong Exponential Time Hypothesis ($\textsf{SETH}$), it is known that no $O(n^{2-o(1)})$ time algorithm exists when the quadratic objective matrix is positive semidefinite (Backurs, Indyk, and Schmidt, NeurIPS'17). However, problems such as SVMs usually admit much smaller input sizes: one is given $n$ data points, each of dimension $d$, and $d$ is oftentimes much smaller than $n$. Furthermore, the SVM program has only $O(1)$ equality linear constraints. This suggests that faster algorithms are feasible, provided the program exhibits certain structures.   In this work, we design the first nearly-linear time algorithm for solving quadratic programs whenever the quadratic objective admits a low-rank factorization, and the number of linear constraints is small. Consequently, we obtain results for SVMs:   * For linear SVM when the input data is $d$-dimensional, our algorithm runs in time $\widetilde O(nd^{(\omega+1)/2}\log(1/\epsilon))$ where $\omega\approx 2.37$ is the fast matrix multiplication exponent;   * For Gaussian kernel SVM, when the data dimension $d = {\color{black}O(\log n)}$ and the squared dataset radius is sub-logarithmic in $n$, our algorithm runs in time $O(n^{1+o(1)}\log(1/\epsilon))$. We also prove that when the squared dataset radius is at least $\Omega(\log^2 n)$, then $\Omega(n^{2-o(1)})$ time is required. This improves upon the prior best lower bound in both the dimension $d$ and the squared dataset radius. 
 
      
        Related papers
        - Improved Algorithms for Kernel Matrix-Vector Multiplication Under   Sparsity Assumptions [23.539428616884035]
 We study fast algorithms for computing matrix-vector products for asymmetric Gaussian Kernel matrices $Kin mathbbRntimes n$.<n>Our algorithms rely on the following modelling assumption about the $K$: the sum of the entries of $K$ scales linearly in $n$, as opposed to the worst case growth.<n>We obtain the first subquadratic-time algorithm that works under this assumption, for unrestricted computation.
 arXiv  Detail & Related papers  (2025-07-31T13:29:43Z)
- Approaching Optimality for Solving Dense Linear Systems with Low-Rank   Structure [16.324043075920564]
 We provide new high-accuracy randomized algorithms for solving linear systems and regression problems.<n>Our algorithms nearly-match a natural complexity limit under dense inputs for these problems.<n>We show how to obtain these running times even under the weaker assumption that all but $k$ of the singular values have a bounded generalized mean.
 arXiv  Detail & Related papers  (2025-07-15T20:48:30Z)
- Quantum Algorithms for Projection-Free Sparse Convex Optimization [32.34794896079469]
 For the vector domain, we propose two quantum algorithms for sparse constraints that find a $varepsilon$-optimal solution with the query complexity of $O(sqrtd/varepsilon)$.<n>For the matrix domain, we propose two quantum algorithms for nuclear norm constraints that improve the time complexity to $tildeO(rd/varepsilon2)$ and $tildeO(sqrtrd/varepsilon3)$.
 arXiv  Detail & Related papers  (2025-07-11T12:43:58Z)
- Optimal Sketching for Residual Error Estimation for Matrix and Vector   Norms [50.15964512954274]
 We study the problem of residual error estimation for matrix and vector norms using a linear sketch.
We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work.
We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
 arXiv  Detail & Related papers  (2024-08-16T02:33:07Z)
- Solving Dense Linear Systems Faster Than via Preconditioning [1.8854491183340518]
 We show that our algorithm has an $tilde O(n2)$ when $k=O(n0.729)$.
In particular, our algorithm has an $tilde O(n2)$ when $k=O(n0.729)$.
Our main algorithm can be viewed as a randomized block coordinate descent method.
 arXiv  Detail & Related papers  (2023-12-14T12:53:34Z)
- Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix
  Factorization [54.29685789885059]
 We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem.
The goal is to approximate $mathbfA$ as a product of low-rank factors.
Our techniques generalize to other common variants of the BMF problem.
 arXiv  Detail & Related papers  (2023-06-02T18:55:27Z)
- Randomized and Deterministic Attention Sparsification Algorithms for
  Over-parameterized Feature Dimension [18.57735939471469]
 We consider the sparsification of the attention problem.
For any super large feature dimension, we can reduce it down to the size nearly linear in length of sentence.
 arXiv  Detail & Related papers  (2023-04-10T05:52:38Z)
- Sketching Algorithms and Lower Bounds for Ridge Regression [65.0720777731368]
 We give a sketching-based iterative algorithm that computes $1+varepsilon$ approximate solutions for the ridge regression problem.
We also show that this algorithm can be used to give faster algorithms for kernel ridge regression.
 arXiv  Detail & Related papers  (2022-04-13T22:18:47Z)
- Distribution Compression in Near-linear Time [27.18971095426405]
 We introduce Compress++, a simple meta-procedure for speeding up any thinning algorithm.
It delivers $sqrtn$ points with $mathcalO(sqrtlog n/n)$ integration error and better-than-Monte-Carlo maximum mean discrepancy.
 arXiv  Detail & Related papers  (2021-11-15T17:42:57Z)
- The Fine-Grained Hardness of Sparse Linear Regression [12.83354999540079]
 We show that there are no better-than-brute-force algorithms for the problem.
We also show the impossibility of better-than-brute-force algorithms when the prediction error is measured.
 arXiv  Detail & Related papers  (2021-06-06T14:19:43Z)
- Learning a Latent Simplex in Input-Sparsity Time [58.30321592603066]
 We consider the problem of learning a latent $k$-vertex simplex $KsubsetmathbbRdtimes n$, given access to $AinmathbbRdtimes n$.
We show that the dependence on $k$ in the running time is unnecessary given a natural assumption about the mass of the top $k$ singular values of $A$.
 arXiv  Detail & Related papers  (2021-05-17T16:40:48Z)
- On Efficient Low Distortion Ultrametric Embedding [18.227854382422112]
 A widely-used method to preserve the underlying hierarchical structure of the data is to find an embedding of the data into a tree or an ultrametric.
In this paper, we provide a new algorithm which takes as input a set of points isometric in $mathbbRd2 (for universal constant $rho>1$) to output an ultrametric $Delta.
We show that the output of our algorithm is comparable to the output of the linkage algorithms while achieving a much faster running time.
 arXiv  Detail & Related papers  (2020-08-15T11:06:45Z)
- Streaming Complexity of SVMs [110.63976030971106]
 We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
 arXiv  Detail & Related papers  (2020-07-07T17:10:00Z)
- Maximizing Determinants under Matroid Constraints [69.25768526213689]
 We study the problem of finding a basis $S$ of $M$ such that $det(sum_i in Sv_i v_i v_itop)$ is maximized.
This problem appears in a diverse set of areas such as experimental design, fair allocation of goods, network design, and machine learning.
 arXiv  Detail & Related papers  (2020-04-16T19:16:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.