Related papers: Binary Kernel Logistic Regression: a sparsity-inducing formulation and a convergent decomposition training algorithm

Binary Kernel Logistic Regression: a sparsity-inducing formulation and a convergent decomposition training algorithm

URL: http://arxiv.org/abs/2512.19440v1
Date: Mon, 22 Dec 2025 14:40:30 GMT
Title: Binary Kernel Logistic Regression: a sparsity-inducing formulation and a convergent decomposition training algorithm
Authors: Antonio Consolo, Andrea Manno, Edoardo Amaldi,
Abstract summary: Kernel logistic regression (KLR) is a widely used supervised learning method for binary and multi-class classification.<n>Previous attempts to deal with sparsity in KLR include a method referred to as the Vector Import Machine (IVM)<n>We propose an extension of the training formulation proposed by Keerthi et al., which is able to induce sparsity in the trained model.
Score: 0.5680416078423551
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Kernel logistic regression (KLR) is a widely used supervised learning method for binary and multi-class classification, which provides estimates of the conditional probabilities of class membership for the data points. Unlike other kernel methods such as Support Vector Machines (SVMs), KLRs are generally not sparse. Previous attempts to deal with sparsity in KLR include a heuristic method referred to as the Import Vector Machine (IVM) and ad hoc regularizations such as the $\ell_{1/2}$-based one. Achieving a good trade-off between prediction accuracy and sparsity is still a challenging issue with a potential significant impact from the application point of view. In this work, we revisit binary KLR and propose an extension of the training formulation proposed by Keerthi et al., which is able to induce sparsity in the trained model, while maintaining good testing accuracy. To efficiently solve the dual of this formulation, we devise a decomposition algorithm of Sequential Minimal Optimization type which exploits second-order information, and for which we establish global convergence. Numerical experiments conducted on 12 datasets from the literature show that the proposed binary KLR approach achieves a competitive trade-off between accuracy and sparsity with respect to IVM, $\ell_{1/2}$-based regularization for KLR, and SVM while retaining the advantages of providing informative estimates of the class membership probabilities.

Related papers

Scalable Analytic Classifiers with Associative Drift Compensation for Class-Incremental Learning of Vision Transformers [26.771319566121708]
Class-incremental learning with Vision Transformers (ViTs) faces a major computational bottleneck during the reconstruction phase.<n>Regularized Gaussian Discriminant Analysis (RGDA) provides a Bayes-optimal alternative with accuracy comparable to SGD-based classifiers.<n>We propose Low-Rank Factorized RGDA (LR-RGDA), a scalable classifier that combines RGDA's expressivity with the efficiency of linear classifiers.
arXiv Detail & Related papers (2026-01-29T06:42:20Z)
Robust low-rank estimation with multiple binary responses using pairwise AUC loss [0.0]
Multiple binary responses arise in many modern data-analytic problems.<n>Low-rank models offer a natural way to encode latent dependence across tasks.<n>Existing methods for binary data are largely likelihood-based and focus on pointwise classification.
arXiv Detail & Related papers (2026-01-13T15:00:10Z)
Scaling Up ROC-Optimizing Support Vector Machines [3.1941554288428198]
The ROC-SVM directly maximizes the area under the ROC curve (AUC) and has become an attractive alternative of the conventional binary classification under the presence of class imbalance.<n>We develop a scalable variant of the ROC-SVM that leverages incomplete U-statistics, thereby substantially reducing computational complexity.<n>We extend the framework to nonlinear classification through a low-rank kernel approximation, enabling efficient training in reproducing kernel Hilbert spaces.
arXiv Detail & Related papers (2025-11-07T04:38:25Z)
Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update [70.38810219913593]
We study the generalized linear bandit (GLB) problem, a contextual multi-armed bandit framework that extends the classical linear model by incorporating a non-linear link function.<n>GLBs are widely applicable to real-world scenarios, but their non-linear nature introduces significant challenges in achieving both computational and statistical efficiency.<n>We propose a jointly efficient algorithm that attains a nearly optimal regret bound with $mathcalO(1)$ time and space complexities per round.
arXiv Detail & Related papers (2025-07-16T02:24:21Z)
Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z)
Towards Generalizable Trajectory Prediction Using Dual-Level Representation Learning And Adaptive Prompting [107.4034346788744]
Existing vehicle trajectory prediction models struggle with generalizability, prediction uncertainties, and handling complex interactions.<n>We propose Perceiver with Register queries (PerReg+), a novel trajectory prediction framework that introduces: (1) Dual-Level Representation Learning via Self-Distillation (SD) and Masked Reconstruction (MR), capturing global context and fine-grained details; (2) Enhanced Multimodality using register-based queries and pretraining, eliminating the need for clustering and suppression; and (3) Adaptive Prompt Tuning during fine-tuning, freezing the main architecture and optimizing a small number of prompts for efficient adaptation.
arXiv Detail & Related papers (2025-01-08T20:11:09Z)
Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class. Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z)
Cost-sensitive probabilistic predictions for support vector machines [1.743685428161914]
Support vector machines (SVMs) are widely used and constitute one of the best examined and used machine learning models. We propose a novel approach to generate probabilistic outputs for the SVM.
arXiv Detail & Related papers (2023-10-09T11:00:17Z)
Decoupled Training for Long-Tailed Classification With Stochastic Representations [15.990318581975435]
Decoupling representation learning and learning has been shown to be effective in classification with long-tailed data. We first apply Weight Averaging (SWA), an optimization technique for improving generalization of deep neural networks, to obtain better generalizing feature extractors for long-tailed classification. We then propose a novel classifier re-training algorithm based on perturbed representation obtained from the SWA-Gaussian, a Gaussian SWA, and a self-distillation strategy.
arXiv Detail & Related papers (2023-04-19T05:35:09Z)
Learning with Multiclass AUC: Theory and Algorithms [141.63211412386283]
Area under the ROC curve (AUC) is a well-known ranking metric for problems such as imbalanced learning and recommender systems. In this paper, we start an early trial to consider the problem of learning multiclass scoring functions via optimizing multiclass AUC metrics.
arXiv Detail & Related papers (2021-07-28T05:18:10Z)
Label-Imbalanced and Group-Sensitive Classification under Overparameterization [32.923780772605596]
Label-imbalanced and group-sensitive classification seeks to appropriately modify standard training algorithms to optimize relevant metrics. We show that a logit-adjusted loss modification to standard empirical risk minimization might be ineffective in general. We show that our results extend naturally to binary classification with sensitive groups, thus treating the two common types of imbalances (label/group) in a unifying way.
arXiv Detail & Related papers (2021-03-02T08:09:43Z)
Binary Classification from Multiple Unlabeled Datasets via Surrogate Set Classification [94.55805516167369]
We propose a new approach for binary classification from m U-sets for $mge2$. Our key idea is to consider an auxiliary classification task called surrogate set classification (SSC)
arXiv Detail & Related papers (2021-02-01T07:36:38Z)
Memory and Computation-Efficient Kernel SVM via Binary Embedding and Ternary Model Coefficients [18.52747917850984]
Kernel approximation is widely used to scale up kernel SVM training and prediction. Memory and computation costs of kernel approximation models are still too high if we want to deploy them on memory-limited devices. We propose a novel memory and computation-efficient kernel SVM model by using both binary embedding and binary model coefficients.
arXiv Detail & Related papers (2020-10-06T09:41:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.