A Novel Loss Function-based Support Vector Machine for Binary Classification
- URL: http://arxiv.org/abs/2403.16654v1
- Date: Mon, 25 Mar 2024 11:42:01 GMT
- Title: A Novel Loss Function-based Support Vector Machine for Binary Classification
- Authors: Yan Li, Liping Zhang,
- Abstract summary: We propose a novel Slide loss function ($ell_s$) to construct the support vector machine classifier($ell_s$-SVM)
By introducing the concept of proximal stationary point, and utilizing the property of Lipschitz continuity, we derive the first-order optimality conditions for $ell_s$-SVM.
To efficiently handle $ell_s$-SVM, we devise a fast alternating direction method of multipliers with the working set.
- Score: 3.773980481058198
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The previous support vector machine(SVM) including $0/1$ loss SVM, hinge loss SVM, ramp loss SVM, truncated pinball loss SVM, and others, overlooked the degree of penalty for the correctly classified samples within the margin. This oversight affects the generalization ability of the SVM classifier to some extent. To address this limitation, from the perspective of confidence margin, we propose a novel Slide loss function ($\ell_s$) to construct the support vector machine classifier($\ell_s$-SVM). By introducing the concept of proximal stationary point, and utilizing the property of Lipschitz continuity, we derive the first-order optimality conditions for $\ell_s$-SVM. Based on this, we define the $\ell_s$ support vectors and working set of $\ell_s$-SVM. To efficiently handle $\ell_s$-SVM, we devise a fast alternating direction method of multipliers with the working set ($\ell_s$-ADMM), and provide the convergence analysis. The numerical experiments on real world datasets confirm the robustness and effectiveness of the proposed method.
Related papers
- $p$SVM: Soft-margin SVMs with $p$-norm Hinge Loss [0.0]
Support Vector Machines (SVMs) based on hinge loss have been extensively discussed and applied to various binary classification tasks.
In this paper, we explore the properties, performance, and training algorithms of $p$SVMs.
arXiv Detail & Related papers (2024-08-19T11:30:00Z) - A Safe Screening Rule with Bi-level Optimization of $\nu$ Support Vector
Machine [15.096652880354199]
We propose a safe screening rule with bi-level optimization for $nu$-SVM.
Our SRBO-$nu$-SVM is strictly deduced by integrating the Karush-Kuhn-Tucker conditions.
We also develop an efficient dual coordinate descent method (DCDM) to further improve computational speed.
arXiv Detail & Related papers (2024-03-04T06:55:57Z) - Transformers as Support Vector Machines [54.642793677472724]
We establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem.
We characterize the implicit bias of 1-layer transformers optimized with gradient descent.
We believe these findings inspire the interpretation of transformers as a hierarchy of SVMs that separates and selects optimal tokens.
arXiv Detail & Related papers (2023-08-31T17:57:50Z) - Kernel Support Vector Machine Classifiers with the $\ell_0$-Norm Hinge
Loss [3.007949058551534]
Support Vector Machine (SVM) has been one of the most successful machine learning techniques for binary classification problems.
This paper is concentrated on vectors with hinge loss (referred as $ell$-KSVM), which is a composite function of hinge loss and $ell_$norm.
Experiments on the synthetic and real datasets are illuminated to show that $ell_$-KSVM can achieve comparable accuracy compared with the standard KSVM.
arXiv Detail & Related papers (2023-06-24T14:52:44Z) - High-Dimensional Penalized Bernstein Support Vector Machines [0.0]
Non-differentiability of the SVM hinge loss function can lead to computational difficulties in high dimensional settings.
We propose two efficient algorithms for computing the solution of the penalized BernSVM.
Our bound holds with high probability and achieves a rate of order $sqrtslog(p)/n$, where $s$ is the number of active features.
arXiv Detail & Related papers (2023-03-16T03:48:29Z) - Horizon-Free and Variance-Dependent Reinforcement Learning for Latent
Markov Decision Processes [62.90204655228324]
We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight.
We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver.
arXiv Detail & Related papers (2022-10-20T21:32:01Z) - Nonlinear Kernel Support Vector Machine with 0-1 Soft Margin Loss [13.803988813225025]
We propose a nonlinear model for support vector machine with 0-1 soft margin loss, called $L_0/1$-KSVM, which cunningly involves the kernel technique into it.
$L_0/1$-KSVM has much fewer SVs, simultaneously with a decent predicting accuracy, when compared to its linear peer.
arXiv Detail & Related papers (2022-03-01T12:53:52Z) - Optimal Spectral Recovery of a Planted Vector in a Subspace [80.02218763267992]
We study efficient estimation and detection of a planted vector $v$ whose $ell_4$ norm differs from that of a Gaussian vector with the same $ell$ norm.
We show that in the regime $n rho gg sqrtN$, any spectral method from a large class (and more generally, any low-degree of the input) fails to detect the planted vector.
arXiv Detail & Related papers (2021-05-31T16:10:49Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z) - Consistent Structured Prediction with Max-Min Margin Markov Networks [84.60515484036239]
Max-margin methods for binary classification have been extended to the structured prediction setting under the name of max-margin Markov networks ($M3N$)
We overcome such limitations by defining the learning problem in terms of a "max-min" margin formulation, naming the resulting method max-min margin Markov networks ($M4N$)
Experiments on multi-class classification, ordinal regression, sequence prediction and ranking demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2020-07-02T10:48:42Z) - Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector.
We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.