Learning Sign-Constrained Support Vector Machines
- URL: http://arxiv.org/abs/2101.01473v1
- Date: Tue, 5 Jan 2021 12:08:17 GMT
- Title: Learning Sign-Constrained Support Vector Machines
- Authors: Kenya Tajima, Takahiko Henmi, Kohei Tsuchida, Esmeraldo Ronnie R.
Zara, and Tsuyoshi Kato
- Abstract summary: We develop two optimization algorithms for minimizing the empirical risk under the sign constraints.
One of the two algorithms is based on the projected gradient method, in which each iteration of the projected gradient method takes $O(nd)$ computational cost.
We empirically demonstrate that the sign constraints are a promising technique when similarities to the training examples compose the feature vector.
- Score: 0.24466725954625884
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Domain knowledge is useful to improve the generalization performance of
learning machines. Sign constraints are a handy representation to combine
domain knowledge with learning machine. In this paper, we consider constraining
the signs of the weight coefficients in learning the linear support vector
machine, and develop two optimization algorithms for minimizing the empirical
risk under the sign constraints. One of the two algorithms is based on the
projected gradient method, in which each iteration of the projected gradient
method takes $O(nd)$ computational cost and the sublinear convergence of the
objective error is guaranteed. The second algorithm is based on the Frank-Wolfe
method that also converges sublinearly and possesses a clear termination
criterion. We show that each iteration of the Frank-Wolfe also requires $O(nd)$
cost. Furthermore, we derive the explicit expression for the minimal iteration
number to ensure an $\epsilon$-accurate solution by analyzing the curvature of
the objective function. Finally, we empirically demonstrate that the sign
constraints are a promising technique when similarities to the training
examples compose the feature vector.
Related papers
- Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - Limited Memory Online Gradient Descent for Kernelized Pairwise Learning
with Dynamic Averaging [18.843097436906618]
We introduce a lightweight OGD algorithm that does not require the independence of examples and generalizes to kernel pairwise learning.
Our algorithm builds the gradient based on a random example and a moving average representing the past data, which results in a sub-linear regret bound with a complexity of $O(T)$.
Several experiments with real-world datasets show that the complexity technique outperforms kernel and linear gradient in offline and online scenarios.
arXiv Detail & Related papers (2024-02-02T05:21:50Z) - Variance Reduced Online Gradient Descent for Kernelized Pairwise
Learning with Limited Memory [19.822215548822882]
Online gradient descent (OGD) algorithms have been proposed to handle online pairwise learning, where data arrives sequentially.
Recent advancements in OGD algorithms have aimed to reduce the complexity of calculating online gradients, achieving complexities less than $O(T)$ and even as low as $O(1)$.
In this study, we propose a limited memory OGD algorithm that extends to kernel online pairwise learning while improving the sublinear regret.
arXiv Detail & Related papers (2023-10-10T09:50:54Z) - Robust Methods for High-Dimensional Linear Learning [0.0]
We propose statistically robust and computationally efficient linear learning methods in the high-dimensional batch setting.
We instantiate our framework on several applications including vanilla sparse, group-sparse and low-rank matrix recovery.
For vanilla $s$-sparsity, we are able to reach the $slog (d)/n$ rate under heavy-tails and $eta$-corruption.
arXiv Detail & Related papers (2022-08-10T17:00:41Z) - Federated Learning with a Sampling Algorithm under Isoperimetry [9.990687944474738]
Federated learning uses a set of techniques to efficiently distribute the training of a machine learning algorithm across several devices.
We propose a communication-efficient variant of Langevinvin's sample a posteriori.
arXiv Detail & Related papers (2022-06-02T08:19:03Z) - A framework for bilevel optimization that enables stochastic and global
variance reduction algorithms [17.12280360174073]
Bilevel optimization is a problem of minimizing a value function which involves the arg-minimum of another function.
We introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time.
We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(frac1T)$ convergence rate, and that it achieves linear convergence under Polyak-Lojasciewicz assumption.
arXiv Detail & Related papers (2022-01-31T18:17:25Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Graph Signal Restoration Using Nested Deep Algorithm Unrolling [85.53158261016331]
Graph signal processing is a ubiquitous task in many applications such as sensor, social transportation brain networks, point cloud processing, and graph networks.
We propose two restoration methods based on convexindependent deep ADMM (ADMM)
parameters in the proposed restoration methods are trainable in an end-to-end manner.
arXiv Detail & Related papers (2021-06-30T08:57:01Z) - Correcting Momentum with Second-order Information [50.992629498861724]
We develop a new algorithm for non-critical optimization that finds an $O(epsilon)$epsilon point in the optimal product.
We validate our results on a variety of large-scale deep learning benchmarks and architectures.
arXiv Detail & Related papers (2021-03-04T19:01:20Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - Upper Confidence Primal-Dual Reinforcement Learning for CMDP with
Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs)
We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model.
Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.