Naive Feature Selection: a Nearly Tight Convex Relaxation for Sparse Naive Bayes
- URL: http://arxiv.org/abs/1905.09884v3
- Date: Fri, 10 Jan 2025 14:18:34 GMT
- Title: Naive Feature Selection: a Nearly Tight Convex Relaxation for Sparse Naive Bayes
- Authors: Armin Askari, Alexandre d'Aspremont, Laurent El Ghaoui,
- Abstract summary: We propose a sparse version of naive Bayes, which can be used for feature selection.
We prove that our convex relaxation bounds becomes tight as the marginal contribution of additional features decreases.
Both binary and multinomial sparse models are solvable in time almost linear in problem size.
- Score: 51.55826927508311
- License:
- Abstract: Due to its linear complexity, naive Bayes classification remains an attractive supervised learning method, especially in very large-scale settings. We propose a sparse version of naive Bayes, which can be used for feature selection. This leads to a combinatorial maximum-likelihood problem, for which we provide an exact solution in the case of binary data, or a bound in the multinomial case. We prove that our convex relaxation bounds becomes tight as the marginal contribution of additional features decreases, using a priori duality gap bounds dervied from the Shapley-Folkman theorem. We show how to produce primal solutions satisfying these bounds. Both binary and multinomial sparse models are solvable in time almost linear in problem size, representing a very small extra relative cost compared to the classical naive Bayes. Numerical experiments on text data show that the naive Bayes feature selection method is as statistically effective as state-of-the-art feature selection methods such as recursive feature elimination, $l_1$-penalized logistic regression and LASSO, while being orders of magnitude faster.
Related papers
- Variational empirical Bayes variable selection in high-dimensional logistic regression [2.4032899110671955]
We develop a novel and computationally efficient variational approximation thereof.
One such novelty is that we develop this approximation directly for the marginal distribution on the model space, rather than on the regression coefficients themselves.
We demonstrate the method's strong performance in simulations, and prove that our variational approximation inherits the strong selection consistency property satisfied by the posterior distribution that it is approximating.
arXiv Detail & Related papers (2025-02-14T19:57:13Z) - Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems.
Such problems are encountered in medicine, physics, and machine learning.
We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z) - Feature selection in linear SVMs via a hard cardinality constraint: a scalable SDP decomposition approach [3.7876216422538485]
We study the embedded feature selection problem in linear Support Vector Machines (SVMs) in which a cardinality constraint is employed.
The problem is NP-hard due to the presence of the cardinality constraint, even though the original linear SVM amounts to a solvable problem in time.
To handle the hard problem, we first introduce two mixed-integer formulations for which novel semidefinite relaxations are proposed.
arXiv Detail & Related papers (2024-04-15T19:15:32Z) - Optimal partition of feature using Bayesian classifier [0.0]
In Naive Bayes, certain features are called independent features as they have no conditional correlation or dependency when predicting a classification.
We propose a novel technique called the Comonotone-Independence (CIBer) which is able to overcome the challenges posed by the Naive Bayes method.
arXiv Detail & Related papers (2023-04-27T21:19:06Z) - Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks.
We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z) - Universal and data-adaptive algorithms for model selection in linear
contextual bandits [52.47796554359261]
We consider the simplest non-trivial instance of model-selection: distinguishing a simple multi-armed bandit problem from a linear contextual bandit problem.
We introduce new algorithms that explore in a data-adaptive manner and provide guarantees of the form $mathcalO(dalpha T1- alpha)$.
Our approach extends to model selection among nested linear contextual bandits under some additional assumptions.
arXiv Detail & Related papers (2021-11-08T18:05:35Z) - Fast Bayesian Variable Selection in Binomial and Negative Binomial
Regression [9.774282306558465]
We introduce an efficient MCMC scheme for variable selection in binomial and negative binomial regression, that exploits logistic regression as a special case.
In experiments we demonstrate the effectiveness of our approach, including on data with seventeen thousand covariates.
arXiv Detail & Related papers (2021-06-28T20:54:41Z) - Fair Sparse Regression with Clustering: An Invex Relaxation for a
Combinatorial Problem [32.18449686637963]
We show that the inclusion of the debiasing/fairness constraint in our model has no adverse effect on the performance.
We simultaneously solve the clustering problem by recovering the exact value of the hidden attribute for each sample.
arXiv Detail & Related papers (2021-02-19T01:46:34Z) - Random extrapolation for primal-dual coordinate descent [61.55967255151027]
We introduce a randomly extrapolated primal-dual coordinate descent method that adapts to sparsity of the data matrix and the favorable structures of the objective function.
We show almost sure convergence of the sequence and optimal sublinear convergence rates for the primal-dual gap and objective values, in the general convex-concave case.
arXiv Detail & Related papers (2020-07-13T17:39:35Z) - MINA: Convex Mixed-Integer Programming for Non-Rigid Shape Alignment [77.38594866794429]
convex mixed-integer programming formulation for non-rigid shape matching.
We propose a novel shape deformation model based on an efficient low-dimensional discrete model.
arXiv Detail & Related papers (2020-02-28T09:54:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.