A Smoothing Algorithm for l1 Support Vector Machines
- URL: http://arxiv.org/abs/2401.09431v1
- Date: Sun, 17 Dec 2023 00:54:56 GMT
- Title: A Smoothing Algorithm for l1 Support Vector Machines
- Authors: Ibrahim Emirahmetoglu, Jeffrey Hajewski, Suely Oliveira, and David E.
Stewart
- Abstract summary: A smoothing algorithm is presented for solving the soft-margin Support Vector Machine (SVM) optimization problem with an $ell1$ penalty.
The algorithm uses smoothing for the hinge-loss function, and an active set approach for the $ell1$ penalty.
Experiments show that our algorithm is capable of strong test accuracy without sacrificing training speed.
- Score: 0.07499722271664144
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: A smoothing algorithm is presented for solving the soft-margin Support Vector
Machine (SVM) optimization problem with an $\ell^{1}$ penalty. This algorithm
is designed to require a modest number of passes over the data, which is an
important measure of its cost for very large datasets. The algorithm uses
smoothing for the hinge-loss function, and an active set approach for the
$\ell^{1}$ penalty. The smoothing parameter $\alpha$ is initially large, but
typically halved when the smoothed problem is solved to sufficient accuracy.
Convergence theory is presented that shows
$\mathcal{O}(1+\log(1+\log_+(1/\alpha)))$ guarded Newton steps for each value
of $\alpha$ except for asymptotic bands $\alpha=\Theta(1)$ and
$\alpha=\Theta(1/N)$, with only one Newton step provided $\eta\alpha\gg1/N$,
where $N$ is the number of data points and the stopping criterion that the
predicted reduction is less than $\eta\alpha$. The experimental results show
that our algorithm is capable of strong test accuracy without sacrificing
training speed.
Related papers
- Fixed Point Computation: Beating Brute Force with Smoothed Analysis [28.978340288565118]
We propose a new algorithm that finds an $varepsilon$-approximate fixed point of a smooth function from the $n$-dimensional $ell$ unit ball to itself.
The algorithm's runtime is bounded by $eO(n)/varepsilon$, under the smoothed-analysis framework.
arXiv Detail & Related papers (2025-01-18T21:32:26Z) - A Near-optimal Algorithm for Learning Margin Halfspaces with Massart Noise [36.29182619215646]
We study the problem of PAC learning $gamma$-margin halfspaces in the presence of Massart noise.
Our algorithm is simple and practical, relying on online SGD on a carefully selected sequence of convex losses.
arXiv Detail & Related papers (2025-01-16T17:44:18Z) - Mini-Batch Kernel $k$-means [4.604003661048267]
A single iteration of our algorithm takes $widetildeO(kb2)$ time, significantly faster than the $O(n2)$ time required by the full batch kernel $k$-means.
Experiments demonstrate that our algorithm consistently achieves a 10-100x speedup with minimal loss in quality.
arXiv Detail & Related papers (2024-10-08T10:59:14Z) - Inverting the Leverage Score Gradient: An Efficient Approximate Newton Method [10.742859956268655]
This paper aims to recover the intrinsic model parameters given the leverage scores gradient.
We specifically scrutinize the inversion of the leverage score gradient, denoted as $g(x)$.
arXiv Detail & Related papers (2024-08-21T01:39:42Z) - Optimal Query Complexities for Dynamic Trace Estimation [59.032228008383484]
We consider the problem of minimizing the number of matrix-vector queries needed for accurate trace estimation in the dynamic setting where our underlying matrix is changing slowly.
We provide a novel binary tree summation procedure that simultaneously estimates all $m$ traces up to $epsilon$ error with $delta$ failure probability.
Our lower bounds (1) give the first tight bounds for Hutchinson's estimator in the matrix-vector product model with Frobenius norm error even in the static setting, and (2) are the first unconditional lower bounds for dynamic trace estimation.
arXiv Detail & Related papers (2022-09-30T04:15:44Z) - List-Decodable Sparse Mean Estimation via Difference-of-Pairs Filtering [42.526664955704746]
We develop a novel, conceptually simpler technique for list-decodable sparse mean estimation.
In particular, for distributions with "certifiably bounded" $t-th moments in $k$-sparse directions, our algorithm achieves error of $(1/alpha)O (1/t)$ with sample complexity $m = (klog(n))O(t)/alpha(mnt)$.
For the special case of Gaussian inliers, our algorithm achieves the optimal error guarantee of $Theta (sqrtlog
arXiv Detail & Related papers (2022-06-10T17:38:18Z) - Sketching Algorithms and Lower Bounds for Ridge Regression [65.0720777731368]
We give a sketching-based iterative algorithm that computes $1+varepsilon$ approximate solutions for the ridge regression problem.
We also show that this algorithm can be used to give faster algorithms for kernel ridge regression.
arXiv Detail & Related papers (2022-04-13T22:18:47Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - On Efficient Low Distortion Ultrametric Embedding [18.227854382422112]
A widely-used method to preserve the underlying hierarchical structure of the data is to find an embedding of the data into a tree or an ultrametric.
In this paper, we provide a new algorithm which takes as input a set of points isometric in $mathbbRd2 (for universal constant $rho>1$) to output an ultrametric $Delta.
We show that the output of our algorithm is comparable to the output of the linkage algorithms while achieving a much faster running time.
arXiv Detail & Related papers (2020-08-15T11:06:45Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z) - Maximizing Determinants under Matroid Constraints [69.25768526213689]
We study the problem of finding a basis $S$ of $M$ such that $det(sum_i in Sv_i v_i v_itop)$ is maximized.
This problem appears in a diverse set of areas such as experimental design, fair allocation of goods, network design, and machine learning.
arXiv Detail & Related papers (2020-04-16T19:16:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.