Learning with Smooth Hinge Losses
- URL: http://arxiv.org/abs/2103.00233v1
- Date: Sat, 27 Feb 2021 14:50:02 GMT
- Title: Learning with Smooth Hinge Losses
- Authors: JunRu Luo, Hong Qiao and Bo Zhang
- Abstract summary: We introduce two smooth Hinge losses $psi_G(alpha;sigma)$ and $psi_M(alpha;sigma)$ which are infinitely differentiable and converge to the Hinge loss uniformly in $alpha$.
Experiments in text classification tasks show that the proposed SSVMs are effective in real-world applications.
- Score: 15.288802707471792
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Due to the non-smoothness of the Hinge loss in SVM, it is difficult to obtain
a faster convergence rate with modern optimization algorithms. In this paper,
we introduce two smooth Hinge losses $\psi_G(\alpha;\sigma)$ and
$\psi_M(\alpha;\sigma)$ which are infinitely differentiable and converge to the
Hinge loss uniformly in $\alpha$ as $\sigma$ tends to $0$. By replacing the
Hinge loss with these two smooth Hinge losses, we obtain two smooth support
vector machines(SSVMs), respectively. Solving the SSVMs with the Trust Region
Newton method (TRON) leads to two quadratically convergent algorithms.
Experiments in text classification tasks show that the proposed SSVMs are
effective in real-world applications. We also introduce a general smooth convex
loss function to unify several commonly-used convex loss functions in machine
learning. The general framework provides smooth approximation functions to
non-smooth convex loss functions, which can be used to obtain smooth models
that can be solved with faster convergent optimization algorithms.
Related papers
- On the Convergence of Multi-objective Optimization under Generalized Smoothness [27.87166415148172]
We study a more general and realistic class of $ell$-smooth loss functions, where $ell$ is a general non-decreasing function norm.
We develop two novel algorithms for $ell$-smooth Generalized Multi-MOO GradientGrad and its variant, Generalized Smooth Multi-MOO descent.
Our algorithms can also guarantee a tighter $mathcalO(epsilon-2)$ in each iteration using more samples.
arXiv Detail & Related papers (2024-05-29T18:36:59Z) - Random Scaling and Momentum for Non-smooth Non-convex Optimization [38.443430569753026]
Training neural networks requires a loss function that may be highly irregular, and in particular neither convex nor smooth.
Popular training algorithms are based on gradient descent with momentum (SGDM), for which analysis applies only if the loss is either convex or smooth.
arXiv Detail & Related papers (2024-05-16T00:52:03Z) - Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - Distributed Extra-gradient with Optimal Complexity and Communication
Guarantees [60.571030754252824]
We consider monotone variational inequality (VI) problems in multi-GPU settings where multiple processors/workers/clients have access to local dual vectors.
Extra-gradient, which is a de facto algorithm for monotone VI problems, has not been designed to be communication-efficient.
We propose a quantized generalized extra-gradient (Q-GenX), which is an unbiased and adaptive compression method tailored to solve VIs.
arXiv Detail & Related papers (2023-08-17T21:15:04Z) - Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach [57.92727189589498]
We propose an online convex optimization approach with two different levels of adaptivity.
We obtain $mathcalO(log V_T)$, $mathcalO(d log V_T)$ and $hatmathcalO(sqrtV_T)$ regret bounds for strongly convex, exp-concave and convex loss functions.
arXiv Detail & Related papers (2023-07-17T09:55:35Z) - Kernel Support Vector Machine Classifiers with the $\ell_0$-Norm Hinge
Loss [3.007949058551534]
Support Vector Machine (SVM) has been one of the most successful machine learning techniques for binary classification problems.
This paper is concentrated on vectors with hinge loss (referred as $ell$-KSVM), which is a composite function of hinge loss and $ell_$norm.
Experiments on the synthetic and real datasets are illuminated to show that $ell_$-KSVM can achieve comparable accuracy compared with the standard KSVM.
arXiv Detail & Related papers (2023-06-24T14:52:44Z) - On Convergence of Incremental Gradient for Non-Convex Smooth Functions [63.51187646914962]
In machine learning and network optimization, algorithms like shuffle SGD are popular due to minimizing the number of misses and good cache.
This paper delves into the convergence properties SGD algorithms with arbitrary data ordering.
arXiv Detail & Related papers (2023-05-30T17:47:27Z) - Gradient-Free Methods for Deterministic and Stochastic Nonsmooth
Nonconvex Optimization [94.19177623349947]
Non-smooth non optimization problems emerge in machine learning and business making.
Two core challenges impede the development of efficient methods with finitetime convergence guarantee.
Two-phase versions of GFM and SGFM are also proposed and proven to achieve improved large-deviation results.
arXiv Detail & Related papers (2022-09-12T06:53:24Z) - Optimal Gradient Sliding and its Application to Distributed Optimization
Under Similarity [121.83085611327654]
We structured convex optimization problems with additive objective $r:=p + q$, where $r$ is $mu$-strong convex similarity.
We proposed a method to solve problems master to agents' communication and local calls.
The proposed method is much sharper than the $mathcalO(sqrtL_q/mu)$ method.
arXiv Detail & Related papers (2022-05-30T14:28:02Z) - An Improved Analysis of Gradient Tracking for Decentralized Machine
Learning [34.144764431505486]
We consider decentralized machine learning over a network where the training data is distributed across $n$ agents.
The agent's common goal is to find a model that minimizes the average of all local loss functions.
We improve the dependency on $p$ from $mathcalO(p-1)$ to $mathcalO(p-1)$ in the noiseless case.
arXiv Detail & Related papers (2022-02-08T12:58:14Z) - Unified SVM Algorithm Based on LS-DC Loss [0.0]
We propose an algorithm that can train different SVM models.
UniSVM has a dominant advantage over all existing algorithms because it has a closed-form solution.
Experiments show that UniSVM can achieve comparable performance in less training time.
arXiv Detail & Related papers (2020-06-16T12:40:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.