Training very large scale nonlinear SVMs using Alternating Direction
Method of Multipliers coupled with the Hierarchically Semi-Separable kernel
approximations
- URL: http://arxiv.org/abs/2108.04167v1
- Date: Mon, 9 Aug 2021 16:52:04 GMT
- Title: Training very large scale nonlinear SVMs using Alternating Direction
Method of Multipliers coupled with the Hierarchically Semi-Separable kernel
approximations
- Authors: S. Cipolla, J. Gondzio
- Abstract summary: nonlinear Support Vector Machines (SVMs) produce significantly higher classification quality when compared to linear ones.
Their computational complexity is prohibitive for large-scale datasets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Typically, nonlinear Support Vector Machines (SVMs) produce significantly
higher classification quality when compared to linear ones but, at the same
time, their computational complexity is prohibitive for large-scale datasets:
this drawback is essentially related to the necessity to store and manipulate
large, dense and unstructured kernel matrices. Despite the fact that at the
core of training a SVM there is a \textit{simple} convex optimization problem,
the presence of kernel matrices is responsible for dramatic performance
reduction, making SVMs unworkably slow for large problems. Aiming to an
efficient solution of large-scale nonlinear SVM problems, we propose the use of
the \textit{Alternating Direction Method of Multipliers} coupled with
\textit{Hierarchically Semi-Separable} (HSS) kernel approximations. As shown in
this work, the detailed analysis of the interaction among their algorithmic
components unveils a particularly efficient framework and indeed, the presented
experimental results demonstrate a significant speed-up when compared to the
\textit{state-of-the-art} nonlinear SVM libraries (without significantly
affecting the classification accuracy).
Related papers
- Short-Long Convolutions Help Hardware-Efficient Linear Attention to Focus on Long Sequences [60.489682735061415]
We propose CHELA, which replaces state space models with short-long convolutions and implements linear attention in a divide-and-conquer manner.
Our experiments on the Long Range Arena benchmark and language modeling tasks demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2024-06-12T12:12:38Z) - Robust optimization for adversarial learning with finite sample complexity guarantees [1.8434042562191815]
In this paper we focus on linear and nonlinear classification problems and propose a novel adversarial training method for robust classifiers.
We view robustness under a data driven lens, and derive finite sample complexity bounds for both linear and non-linear classifiers in binary and multi-class scenarios.
Our algorithm minimizes a worst-case surrogate loss using Linear Programming (LP) and Second Order Cone Programming (SOCP) for linear and non-linear models.
arXiv Detail & Related papers (2024-03-22T13:49:53Z) - Optimization meets Machine Learning: An Exact Algorithm for Semi-Supervised Support Vector Machines [0.9831489366502302]
Support vector machines (SVMs) are well-studied supervised learning models for binary classification.
We present a new branch approach for S3VMs using semidefinite programming (SDP) relaxations.
SDP relaxation provides bounds significantly stronger than the ones available in the literature.
arXiv Detail & Related papers (2023-12-15T13:44:54Z) - A Preconditioned Interior Point Method for Support Vector Machines Using
an ANOVA-Decomposition and NFFT-Based Matrix-Vector Products [0.6445605125467574]
We propose an NFFT-accelerated matrix-vector product using an ANOVA decomposition for the feature space that is used within an interior point method for the overall optimization problem.
We investigate the performance of the different preconditioners as well as the accuracy of the ANOVA kernel on several large-scale datasets.
arXiv Detail & Related papers (2023-12-01T12:27:11Z) - Large-scale gradient-based training of Mixtures of Factor Analyzers [67.21722742907981]
This article contributes both a theoretical analysis as well as a new method for efficient high-dimensional training by gradient descent.
We prove that MFA training and inference/sampling can be performed based on precision matrices, which does not require matrix inversions after training is completed.
Besides the theoretical analysis and matrices, we apply MFA to typical image datasets such as SVHN and MNIST, and demonstrate the ability to perform sample generation and outlier detection.
arXiv Detail & Related papers (2023-08-26T06:12:33Z) - An Efficient Method for Sample Adversarial Perturbations against
Nonlinear Support Vector Machines [8.000799046379749]
We investigate the sample adversarial perturbations for nonlinear support vector machines (SVMs)
Due to the implicit form of the nonlinear functions mapping data to the feature space, it is difficult to obtain the explicit form of the adversarial perturbations.
By exploring the special property of nonlinear SVMs, we transform the optimization problem of attacking nonlinear SVMs into a nonlinear KKT system.
arXiv Detail & Related papers (2022-06-12T05:21:51Z) - Handling Imbalanced Classification Problems With Support Vector Machines
via Evolutionary Bilevel Optimization [73.17488635491262]
Support vector machines (SVMs) are popular learning algorithms to deal with binary classification problems.
This article introduces EBCS-SVM: evolutionary bilevel cost-sensitive SVMs.
arXiv Detail & Related papers (2022-04-21T16:08:44Z) - High-Dimensional Sparse Bayesian Learning without Covariance Matrices [66.60078365202867]
We introduce a new inference scheme that avoids explicit construction of the covariance matrix.
Our approach couples a little-known diagonal estimation result from numerical linear algebra with the conjugate gradient algorithm.
On several simulations, our method scales better than existing approaches in computation time and memory.
arXiv Detail & Related papers (2022-02-25T16:35:26Z) - Optimization on manifolds: A symplectic approach [127.54402681305629]
We propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems.
Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.
arXiv Detail & Related papers (2021-07-23T13:43:34Z) - Estimating Average Treatment Effects with Support Vector Machines [77.34726150561087]
Support vector machine (SVM) is one of the most popular classification algorithms in the machine learning literature.
We adapt SVM as a kernel-based weighting procedure that minimizes the maximum mean discrepancy between the treatment and control groups.
We characterize the bias of causal effect estimation arising from this trade-off, connecting the proposed SVM procedure to the existing kernel balancing methods.
arXiv Detail & Related papers (2021-02-23T20:22:56Z) - AML-SVM: Adaptive Multilevel Learning with Support Vector Machines [0.0]
This paper proposes an adaptive multilevel learning framework for the nonlinear SVM.
It improves the classification quality across the refinement process, and leverages multi-threaded parallel processing for better performance.
arXiv Detail & Related papers (2020-11-05T00:17:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.