Related papers: High-Dimensional Penalized Bernstein Support Vector Machines

High-Dimensional Penalized Bernstein Support Vector Machines

URL: http://arxiv.org/abs/2303.09066v1
Date: Thu, 16 Mar 2023 03:48:29 GMT
Title: High-Dimensional Penalized Bernstein Support Vector Machines
Authors: Rachid Kharoubi, Abdallah Mkhadri and Karim Oualkacha
Abstract summary: Non-differentiability of the SVM hinge loss function can lead to computational difficulties in high dimensional settings. We propose two efficient algorithms for computing the solution of the penalized BernSVM. Our bound holds with high probability and achieves a rate of order $sqrtslog(p)/n$, where $s$ is the number of active features.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The support vector machines (SVM) is a powerful classifier used for binary classification to improve the prediction accuracy. However, the non-differentiability of the SVM hinge loss function can lead to computational difficulties in high dimensional settings. To overcome this problem, we rely on Bernstein polynomial and propose a new smoothed version of the SVM hinge loss called the Bernstein support vector machine (BernSVM), which is suitable for the high dimension $p >> n$ regime. As the BernSVM objective loss function is of the class $C^2$, we propose two efficient algorithms for computing the solution of the penalized BernSVM. The first algorithm is based on coordinate descent with maximization-majorization (MM) principle and the second one is IRLS-type algorithm (iterative re-weighted least squares). Under standard assumptions, we derive a cone condition and a restricted strong convexity to establish an upper bound for the weighted Lasso BernSVM estimator. Using a local linear approximation, we extend the latter result to penalized BernSVM with non convex penalties SCAD and MCP. Our bound holds with high probability and achieves a rate of order $\sqrt{s\log(p)/n}$, where $s$ is the number of active features. Simulation studies are considered to illustrate the prediction accuracy of BernSVM to its competitors and also to compare the performance of the two algorithms in terms of computational timing and error estimation. The use of the proposed method is illustrated through analysis of three large-scale real data examples.

Related papers

$p$SVM: Soft-margin SVMs with $p$-norm Hinge Loss [0.0]
Support Vector Machines (SVMs) based on hinge loss have been extensively discussed and applied to various binary classification tasks. In this paper, we explore the properties, performance, and training algorithms of $p$SVMs.
arXiv Detail & Related papers (2024-08-19T11:30:00Z)
A Novel Loss Function-based Support Vector Machine for Binary Classification [3.773980481058198]
We propose a novel Slide loss function ($ell_s$) to construct the support vector machine classifier($ell_s$-SVM) By introducing the concept of proximal stationary point, and utilizing the property of Lipschitz continuity, we derive the first-order optimality conditions for $ell_s$-SVM. To efficiently handle $ell_s$-SVM, we devise a fast alternating direction method of multipliers with the working set.
arXiv Detail & Related papers (2024-03-25T11:42:01Z)
A Safe Screening Rule with Bi-level Optimization of $\nu$ Support Vector Machine [15.096652880354199]
We propose a safe screening rule with bi-level optimization for $nu$-SVM. Our SRBO-$nu$-SVM is strictly deduced by integrating the Karush-Kuhn-Tucker conditions. We also develop an efficient dual coordinate descent method (DCDM) to further improve computational speed.
arXiv Detail & Related papers (2024-03-04T06:55:57Z)
Transformers as Support Vector Machines [54.642793677472724]
We establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem. We characterize the implicit bias of 1-layer transformers optimized with gradient descent. We believe these findings inspire the interpretation of transformers as a hierarchy of SVMs that separates and selects optimal tokens.
arXiv Detail & Related papers (2023-08-31T17:57:50Z)
Kernel Support Vector Machine Classifiers with the $\ell_0$-Norm Hinge Loss [3.007949058551534]
Support Vector Machine (SVM) has been one of the most successful machine learning techniques for binary classification problems. This paper is concentrated on vectors with hinge loss (referred as $ell$-KSVM), which is a composite function of hinge loss and $ell_$norm. Experiments on the synthetic and real datasets are illuminated to show that $ell_$-KSVM can achieve comparable accuracy compared with the standard KSVM.
arXiv Detail & Related papers (2023-06-24T14:52:44Z)
Handling Imbalanced Classification Problems With Support Vector Machines via Evolutionary Bilevel Optimization [73.17488635491262]
Support vector machines (SVMs) are popular learning algorithms to deal with binary classification problems. This article introduces EBCS-SVM: evolutionary bilevel cost-sensitive SVMs.
arXiv Detail & Related papers (2022-04-21T16:08:44Z)
Estimating Average Treatment Effects with Support Vector Machines [77.34726150561087]
Support vector machine (SVM) is one of the most popular classification algorithms in the machine learning literature. We adapt SVM as a kernel-based weighting procedure that minimizes the maximum mean discrepancy between the treatment and control groups. We characterize the bias of causal effect estimation arising from this trade-off, connecting the proposed SVM procedure to the existing kernel balancing methods.
arXiv Detail & Related papers (2021-02-23T20:22:56Z)
A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems. MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z)
Large-Scale Methods for Distributionally Robust Optimization [53.98643772533416]
We prove that our algorithms require a number of evaluations gradient independent of training set size and number of parameters. Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
arXiv Detail & Related papers (2020-10-12T17:41:44Z)
A Semismooth-Newton's-Method-Based Linearization and Approximation Approach for Kernel Support Vector Machines [1.177306187948666]
Support Vector Machines (SVMs) are among the most popular and the best performing classification algorithms. In this paper, we propose a semismooth Newton's method based linearization approximation approach for kernel SVMs. The advantage of the proposed approach is that it maintains low computational cost and keeps a fast convergence rate.
arXiv Detail & Related papers (2020-07-21T07:44:21Z)
Consistent Structured Prediction with Max-Min Margin Markov Networks [84.60515484036239]
Max-margin methods for binary classification have been extended to the structured prediction setting under the name of max-margin Markov networks ($M3N$) We overcome such limitations by defining the learning problem in terms of a "max-min" margin formulation, naming the resulting method max-min margin Markov networks ($M4N$) Experiments on multi-class classification, ordinal regression, sequence prediction and ranking demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2020-07-02T10:48:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.