Cost-sensitive Feature Selection for Support Vector Machines
- URL: http://arxiv.org/abs/2401.07627v1
- Date: Mon, 15 Jan 2024 12:07:52 GMT
- Title: Cost-sensitive Feature Selection for Support Vector Machines
- Authors: Sandra Ben\'itez-Pe\~na and Rafael Blanquero and Emilio Carrizosa and
Pepa Ram\'irez-Cobo
- Abstract summary: We propose a mathematical-optimization-based Feature Selection procedure embedded in one of the most popular classification procedures, Support Vector Machines.
We show that a substantial decrease of the number of features is obtained, whilst the desired trade-off between false positive and false negative rates is achieved.
- Score: 1.743685428161914
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Feature Selection is a crucial procedure in Data Science tasks such as
Classification, since it identifies the relevant variables, making thus the
classification procedures more interpretable, cheaper in terms of measurement
and more effective by reducing noise and data overfit. The relevance of
features in a classification procedure is linked to the fact that
misclassifications costs are frequently asymmetric, since false positive and
false negative cases may have very different consequences. However,
off-the-shelf Feature Selection procedures seldom take into account such
cost-sensitivity of errors.
In this paper we propose a mathematical-optimization-based Feature Selection
procedure embedded in one of the most popular classification procedures,
namely, Support Vector Machines, accommodating asymmetric misclassification
costs. The key idea is to replace the traditional margin maximization by
minimizing the number of features selected, but imposing upper bounds on the
false positive and negative rates. The problem is written as an integer linear
problem plus a quadratic convex problem for Support Vector Machines with both
linear and radial kernels.
The reported numerical experience demonstrates the usefulness of the proposed
Feature Selection procedure. Indeed, our results on benchmark data sets show
that a substantial decrease of the number of features is obtained, whilst the
desired trade-off between false positive and false negative rates is achieved.
Related papers
- Ask for More Than Bayes Optimal: A Theory of Indecisions for Classification [1.8434042562191815]
Selective classification is a powerful tool for automated decision-making in high-risk scenarios.
Our goal is to minimize the number of indecisions, which are observations that we do not automate.
By using indecisions, we are able to control the misclassification rate to any user-specified level, even below the Bayes optimal error rate.
arXiv Detail & Related papers (2024-12-17T11:25:51Z) - A Hybrid Framework for Statistical Feature Selection and Image-Based Noise-Defect Detection [55.2480439325792]
This paper presents a hybrid framework that integrates both statistical feature selection and classification techniques to improve defect detection accuracy.
We present around 55 distinguished features that are extracted from industrial images, which are then analyzed using statistical methods.
By integrating these methods with flexible machine learning applications, the proposed framework improves detection accuracy and reduces false positives and misclassifications.
arXiv Detail & Related papers (2024-12-11T22:12:21Z) - Implicit Regularization for Multi-label Feature Selection [1.5771347525430772]
We address the problem of feature selection in the context of multi-label learning by using a new estimator based on implicit regularization and label embedding.
Experimental results on some known benchmark datasets suggest that the proposed estimator suffers much less from extra bias, and may lead to benign overfitting.
arXiv Detail & Related papers (2024-11-18T10:08:05Z) - Causal Feature Selection via Transfer Entropy [59.999594949050596]
Causal discovery aims to identify causal relationships between features with observational data.
We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures.
We provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases.
arXiv Detail & Related papers (2023-10-17T08:04:45Z) - Nonparametric active learning for cost-sensitive classification [2.1756081703276]
We design a generic nonparametric active learning algorithm for cost-sensitive classification.
We prove the near-optimality of obtained upper bounds by providing matching (up to logarithmic factor) lower bounds.
arXiv Detail & Related papers (2023-09-30T22:19:21Z) - Bilevel Optimization for Feature Selection in the Data-Driven Newsvendor
Problem [8.281391209717105]
We study the feature-based news vendor problem, in which a decision-maker has access to historical data.
In this setting, we investigate feature selection, aiming to derive sparse, explainable models with improved out-of-sample performance.
We present a mixed integer linear program reformulation for the bilevel program, which can be solved to optimality with standard optimization solvers.
arXiv Detail & Related papers (2022-09-12T08:52:26Z) - Optimizing Partial Area Under the Top-k Curve: Theory and Practice [151.5072746015253]
We develop a novel metric named partial Area Under the top-k Curve (AUTKC)
AUTKC has a better discrimination ability, and its Bayes optimal score function could give a correct top-K ranking with respect to the conditional probability.
We present an empirical surrogate risk minimization framework to optimize the proposed metric.
arXiv Detail & Related papers (2022-09-03T11:09:13Z) - Determination of class-specific variables in nonparametric
multiple-class classification [0.0]
We propose a probability-based nonparametric multiple-class classification method, and integrate it with the ability of identifying high impact variables for individual class.
We report the properties of the proposed method, and use both synthesized and real data sets to illustrate its properties under different classification situations.
arXiv Detail & Related papers (2022-05-07T10:08:58Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Gradient Descent in RKHS with Importance Labeling [58.79085525115987]
We study importance labeling problem, in which we are given many unlabeled data.
We propose a new importance labeling scheme that can effectively select an informative subset of unlabeled data.
arXiv Detail & Related papers (2020-06-19T01:55:00Z) - A novel embedded min-max approach for feature selection in nonlinear
support vector machine classification [0.0]
We propose an embedded feature selection method based on a min-max optimization problem.
By leveraging duality theory, we equivalently reformulate the min-max problem and solve it without further ado.
The efficiency and usefulness of our approach are tested on several benchmark data sets.
arXiv Detail & Related papers (2020-04-21T09:40:38Z) - Implicit differentiation of Lasso-type models for hyperparameter
optimization [82.73138686390514]
We introduce an efficient implicit differentiation algorithm, without matrix inversion, tailored for Lasso-type problems.
Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
arXiv Detail & Related papers (2020-02-20T18:43:42Z) - Supervised Quantile Normalization for Low-rank Matrix Approximation [50.445371939523305]
We learn the parameters of quantile normalization operators that can operate row-wise on the values of $X$ and/or of its factorization $UV$ to improve the quality of the low-rank representation of $X$ itself.
We demonstrate the applicability of these techniques on synthetic and genomics datasets.
arXiv Detail & Related papers (2020-02-08T21:06:02Z) - Naive Feature Selection: a Nearly Tight Convex Relaxation for Sparse Naive Bayes [51.55826927508311]
We propose a sparse version of naive Bayes, which can be used for feature selection.
We prove that our convex relaxation bounds becomes tight as the marginal contribution of additional features decreases.
Both binary and multinomial sparse models are solvable in time almost linear in problem size.
arXiv Detail & Related papers (2019-05-23T19:30:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.