The Stochastic Conjugate Subgradient Algorithm For Kernel Support Vector Machines
- URL: http://arxiv.org/abs/2407.21091v1
- Date: Tue, 30 Jul 2024 17:03:19 GMT
- Title: The Stochastic Conjugate Subgradient Algorithm For Kernel Support Vector Machines
- Authors: Di Zhang, Suvrajeet Sen,
- Abstract summary: This paper proposes an innovative method specifically designed for kernel support vector machines (SVMs)
It not only achieves faster iteration per iteration but also exhibits enhanced convergence when compared to conventional SFO techniques.
Our experimental results demonstrate that the proposed algorithm not only maintains but potentially exceeds the scalability of SFO methods.
- Score: 1.738375118265695
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic First-Order (SFO) methods have been a cornerstone in addressing a broad spectrum of modern machine learning (ML) challenges. However, their efficacy is increasingly questioned, especially in large-scale applications where empirical evidence indicates potential performance limitations. In response, this paper proposes an innovative method specifically designed for kernel support vector machines (SVMs). This method not only achieves faster convergence per iteration but also exhibits enhanced scalability when compared to conventional SFO techniques. Diverging from traditional sample average approximation strategies that typically frame kernel SVM as an 'all-in-one' Quadratic Program (QP), our approach adopts adaptive sampling. This strategy incrementally refines approximation accuracy on an 'as-needed' basis. Crucially, this approach also inspires a decomposition-based algorithm, effectively decomposing parameter selection from error estimation, with the latter being independently determined for each data point. To exploit the quadratic nature of the kernel matrix, we introduce a stochastic conjugate subgradient method. This method preserves many benefits of first-order approaches while adeptly handling both nonlinearity and non-smooth aspects of the SVM problem. Thus, it extends beyond the capabilities of standard SFO algorithms for non-smooth convex optimization. The convergence rate of this novel method is thoroughly analyzed within this paper. Our experimental results demonstrate that the proposed algorithm not only maintains but potentially exceeds the scalability of SFO methods. Moreover, it significantly enhances both speed and accuracy of the optimization process.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Resource-Adaptive Newton's Method for Distributed Learning [16.588456212160928]
This paper introduces a novel and efficient algorithm called RANL, which overcomes the limitations of Newton's method.
Unlike traditional first-order methods, RANL exhibits remarkable independence from the condition number of the problem.
arXiv Detail & Related papers (2023-08-20T04:01:30Z) - An Optimization-based Deep Equilibrium Model for Hyperspectral Image
Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem.
A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network.
The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z) - Exploring the Algorithm-Dependent Generalization of AUPRC Optimization
with List Stability [107.65337427333064]
optimization of the Area Under the Precision-Recall Curve (AUPRC) is a crucial problem for machine learning.
In this work, we present the first trial in the single-dependent generalization of AUPRC optimization.
Experiments on three image retrieval datasets on speak to the effectiveness and soundness of our framework.
arXiv Detail & Related papers (2022-09-27T09:06:37Z) - Adaptive First- and Second-Order Algorithms for Large-Scale Machine
Learning [3.0204520109309843]
We consider first- and second-order techniques to address continuous optimization problems in machine learning.
In the first-order case, we propose a framework of transition from semi-deterministic to quadratic regularization methods.
In the second-order case, we propose a novel first-order algorithm with adaptive sampling and adaptive step size.
arXiv Detail & Related papers (2021-11-29T18:10:00Z) - Continuation Newton methods with deflation techniques for global
optimization problems [3.705839280172101]
A global minimum point of an optimization problem is of interest in engineering.
In this article, we consider a new memetic algorithm for this nonlinear largescale problem.
According to our numerical experiments, new algorithm works well for unconstrained unconstrained problems.
arXiv Detail & Related papers (2021-07-29T09:53:49Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Minibatch and Momentum Model-based Methods for Stochastic Non-smooth
Non-convex Optimization [3.4809730725241597]
We make two important extensions to model-based methods.
First, we propose a new minibatch which takes a set of samples to approximate the model function in each iteration.
Second, by the success of momentum techniques we propose a new convex-based model.
arXiv Detail & Related papers (2021-06-06T05:31:57Z) - A Dynamical Systems Approach for Convergence of the Bayesian EM
Algorithm [59.99439951055238]
We show how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimization algorithms that are not necessarily gradient-based.
The particular ML problem that this paper focuses on is that of parameter estimation in an incomplete-data Bayesian framework via the popular optimization algorithm known as maximum a posteriori expectation-maximization (MAP-EM)
We show that fast convergence (linear or quadratic) is achieved, which could have been difficult to unveil without our adopted S&C approach.
arXiv Detail & Related papers (2020-06-23T01:34:18Z) - Effective Dimension Adaptive Sketching Methods for Faster Regularized
Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching.
We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z) - Robust Learning Rate Selection for Stochastic Optimization via Splitting
Diagnostic [5.395127324484869]
SplitSGD is a new dynamic learning schedule for optimization.
The method decreases the learning rate for better adaptation to the local geometry of the objective function.
It essentially does not incur additional computational cost than standard SGD.
arXiv Detail & Related papers (2019-10-18T19:38:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.