Variable selection for Na\"ive Bayes classification
- URL: http://arxiv.org/abs/2401.18039v1
- Date: Wed, 31 Jan 2024 18:01:36 GMT
- Title: Variable selection for Na\"ive Bayes classification
- Authors: Rafael Blanquero, Emilio Carrizosa, Pepa Ram\'irez-Cobo, M. Remedios
Sillero-Denamiel
- Abstract summary: The Na"ive Bayes has proven to be a tractable and efficient method for classification in multivariate analysis.
We propose a sparse version of the Na"ive Bayes that is characterized by three properties.
Our findings show that, when compared against well-referenced feature selection approaches, the proposed sparse Na"ive Bayes obtains competitive results.
- Score: 2.8265531928694116
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The Na\"ive Bayes has proven to be a tractable and efficient method for
classification in multivariate analysis. However, features are usually
correlated, a fact that violates the Na\"ive Bayes' assumption of conditional
independence, and may deteriorate the method's performance. Moreover, datasets
are often characterized by a large number of features, which may complicate the
interpretation of the results as well as slow down the method's execution.
In this paper we propose a sparse version of the Na\"ive Bayes classifier
that is characterized by three properties. First, the sparsity is achieved
taking into account the correlation structure of the covariates. Second,
different performance measures can be used to guide the selection of features.
Third, performance constraints on groups of higher interest can be included.
Our proposal leads to a smart search, which yields competitive running times,
whereas the flexibility in terms of performance measure for classification is
integrated. Our findings show that, when compared against well-referenced
feature selection approaches, the proposed sparse Na\"ive Bayes obtains
competitive results regarding accuracy, sparsity and running times for balanced
datasets. In the case of datasets with unbalanced (or with different
importance) classes, a better compromise between classification rates for the
different classes is achieved.
Related papers
- Fractional Naive Bayes (FNB): non-convex optimization for a parsimonious weighted selective naive Bayes classifier [0.0]
We supervised classification for datasets with a very large number of input variables.
We propose a regularization of the model log-like Baylihood.
The various proposed algorithms result in optimization-based weighted na"ivees scheme.
arXiv Detail & Related papers (2024-09-17T11:54:14Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Mitigating Word Bias in Zero-shot Prompt-based Classifiers [55.60306377044225]
We show that matching class priors correlates strongly with the oracle upper bound performance.
We also demonstrate large consistent performance gains for prompt settings over a range of NLP tasks.
arXiv Detail & Related papers (2023-09-10T10:57:41Z) - Bipartite Ranking Fairness through a Model Agnostic Ordering Adjustment [54.179859639868646]
We propose a model agnostic post-processing framework xOrder for achieving fairness in bipartite ranking.
xOrder is compatible with various classification models and ranking fairness metrics, including supervised and unsupervised fairness metrics.
We evaluate our proposed algorithm on four benchmark data sets and two real-world patient electronic health record repositories.
arXiv Detail & Related papers (2023-07-27T07:42:44Z) - Optimal partition of feature using Bayesian classifier [0.0]
In Naive Bayes, certain features are called independent features as they have no conditional correlation or dependency when predicting a classification.
We propose a novel technique called the Comonotone-Independence (CIBer) which is able to overcome the challenges posed by the Naive Bayes method.
arXiv Detail & Related papers (2023-04-27T21:19:06Z) - Robust Outlier Rejection for 3D Registration with Variational Bayes [70.98659381852787]
We develop a novel variational non-local network-based outlier rejection framework for robust alignment.
We propose a voting-based inlier searching strategy to cluster the high-quality hypothetical inliers for transformation estimation.
arXiv Detail & Related papers (2023-04-04T03:48:56Z) - Exploring Category-correlated Feature for Few-shot Image Classification [27.13708881431794]
We present a simple yet effective feature rectification method by exploring the category correlation between novel and base classes as the prior knowledge.
The proposed approach consistently obtains considerable performance gains on three widely used benchmarks.
arXiv Detail & Related papers (2021-12-14T08:25:24Z) - When in Doubt: Improving Classification Performance with Alternating
Normalization [57.39356691967766]
We introduce Classification with Alternating Normalization (CAN), a non-parametric post-processing step for classification.
CAN improves classification accuracy for challenging examples by re-adjusting their predicted class probability distribution.
We empirically demonstrate its effectiveness across a diverse set of classification tasks.
arXiv Detail & Related papers (2021-09-28T02:55:42Z) - How Nonconformity Functions and Difficulty of Datasets Impact the
Efficiency of Conformal Classifiers [0.1611401281366893]
In conformal classification, the systems can output multiple class labels instead of one.
For a Neural Network-based conformal classifier, the inverse probability allows minimizing the average number of predicted labels.
We propose a successful method to combine the properties of these two nonconformity functions.
arXiv Detail & Related papers (2021-08-12T11:50:12Z) - Dynamic Instance-Wise Classification in Correlated Feature Spaces [15.351282873821935]
In a typical machine learning setting, the predictions on all test instances are based on a common subset of features discovered during model training.
A new method is proposed that sequentially selects the best feature to evaluate for each test instance individually, and stops the selection process to make a prediction once it determines that no further improvement can be achieved with respect to classification accuracy.
The effectiveness, generalizability, and scalability of the proposed method is illustrated on a variety of real-world datasets from diverse application domains.
arXiv Detail & Related papers (2021-06-08T20:20:36Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.