Related papers: Sparse Multiple Kernel Learning: Alternating Best Response and Semidefinite Relaxations

Sparse Multiple Kernel Learning: Alternating Best Response and Semidefinite Relaxations

URL: http://arxiv.org/abs/2511.21890v2
Date: Tue, 02 Dec 2025 04:05:44 GMT
Title: Sparse Multiple Kernel Learning: Alternating Best Response and Semidefinite Relaxations
Authors: Dimitris Bertsimas, Caio de Prospero Iglesias, Nicholas A. G. Johnson,
Abstract summary: We consider the problem of selecting a sparse combination of prespecified kernels for support vector binary classification.<n>We derive a hierarchy of predictions which can be used to certify the solutions returned by our best response and also to warm start it.
Score: 4.86840788259762
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study Sparse Multiple Kernel Learning (SMKL), which is the problem of selecting a sparse convex combination of prespecified kernels for support vector binary classification. Unlike prevailing l1 regularized approaches that approximate a sparsifying penalty, we formulate the problem by imposing an explicit cardinality constraint on the kernel weights and add an l2 penalty for robustness. We solve the resulting non-convex minimax problem via an alternating best response algorithm with two subproblems: the alpha subproblem is a standard kernel SVM dual solved via LIBSVM, while the beta subproblem admits an efficient solution via the Greedy Selector and Simplex Projector algorithm. We reformulate SMKL as a mixed integer semidefinite optimization problem and derive a hierarchy of semidefinite convex relaxations which can be used to certify near-optimality of the solutions returned by our best response algorithm and also to warm start it. On ten UCI benchmarks, our method with random initialization outperforms state-of-the-art MKL approaches in out-of-sample prediction accuracy on average by 3.34 percentage points (relative to the best performing benchmark) while selecting a small number of candidate kernels in comparable runtime. With warm starting, our method outperforms the best performing benchmark's out-of-sample prediction accuracy on average by 4.05 percentage points. Our convex relaxations provide a certificate that in several cases, the solution returned by our best response algorithm is the globally optimal solution.

Related papers

Scalable Second-order Riemannian Optimization for $K$-means Clustering [22.839486642997187]
We present a new formulation of the $K$-means clustering problem as a submanifold.<n>By factorizing the $K$-means manifold into a product manifold, we show how each Newton subproblem can be solved.<n>Our experiments show that the proposed method converges significantly faster than the state-of-the-art first-order negative non-negative low-rank factorization method.
arXiv Detail & Related papers (2025-09-25T22:49:00Z)
Provable Scaling Laws for the Test-Time Compute of Large Language Models [84.00141420901038]
We propose two algorithms that enjoy provable scaling laws for the test-time compute of large language models.<n>One is a two-stage knockout-style algorithm, where each candidate is evaluated by its average win rate against multiple opponents.<n>The other is a two-stage league-style algorithm, where each candidate is evaluated by its average win rate against multiple opponents.
arXiv Detail & Related papers (2024-11-29T05:29:47Z)
Compressed Sensing: A Discrete Optimization Approach [5.877778007271621]
We present a semidefinite relaxation that strengthens the second order cone relaxation and develop a custom branch-and-bound algorithm. When used as a component of a multi-label classification algorithm, our approach achieves greater classification accuracy than benchmark compressed sensing methods.
arXiv Detail & Related papers (2023-06-05T01:29:24Z)
Efficient Convex Algorithms for Universal Kernel Learning [46.573275307034336]
An ideal set of kernels should: admit a linear parameterization (for tractability); dense in the set of all kernels (for accuracy) Previous algorithms for optimization of kernels were limited to classification and relied on computationally complex Semidefinite Programming (SDP) algorithms. We propose a SVD-QCQPQP algorithm which dramatically reduces the computational complexity as compared with previous SDP-based approaches.
arXiv Detail & Related papers (2023-04-15T04:57:37Z)
BiAdam: Fast Adaptive Bilevel Optimization Methods [104.96004056928474]
Bilevel optimization has attracted increased interest in machine learning due to its many applications. We provide a useful analysis framework for both the constrained and unconstrained optimization.
arXiv Detail & Related papers (2021-06-21T20:16:40Z)
High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level. We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
Accelerated Sparse Bayesian Learning via Screening Test and Its Applications [0.9916217495995309]
For a linear system, to find the sparsest solution provided with an over-complete dictionary of features directly is typically NP-hard. We propose sparse Bayesian learning, which uses a parameterized prior to encourage sparsity in solution.
arXiv Detail & Related papers (2020-07-08T10:21:56Z)
Convergence of adaptive algorithms for weakly convex constrained optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope. Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
SimpleMKKM: Simple Multiple Kernel K-means [49.500663154085586]
We propose a simple yet effective multiple kernel clustering algorithm, termed simple multiple kernel k-means (SimpleMKKM) Our criterion is given by an intractable minimization-maximization problem in the kernel coefficient and clustering partition matrix. We theoretically analyze the performance of SimpleMKKM in terms of its clustering generalization error.
arXiv Detail & Related papers (2020-05-11T10:06:40Z)
Optimal Randomized First-Order Methods for Least-Squares Problems [56.05635751529922]
This class of algorithms encompasses several randomized methods among the fastest solvers for least-squares problems. We focus on two classical embeddings, namely, Gaussian projections and subsampled Hadamard transforms. Our resulting algorithm yields the best complexity known for solving least-squares problems with no condition number dependence.
arXiv Detail & Related papers (2020-02-21T17:45:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.