A Max-relevance-min-divergence Criterion for Data Discretization with
Applications on Naive Bayes
- URL: http://arxiv.org/abs/2209.10095v2
- Date: Wed, 5 Apr 2023 02:30:11 GMT
- Title: A Max-relevance-min-divergence Criterion for Data Discretization with
Applications on Naive Bayes
- Authors: Shihe Wang, Jianfeng Ren, Ruibin Bai, Yuan Yao, Xudong Jiang
- Abstract summary: In many classification models, data is discretized to better estimate its distribution.
We propose a Max-Dependency-Min-Divergence (MDmD) criterion that maximizes both the discriminant information and generalization ability of the discretized data.
We propose a more practical solution, Max-Relevance-Min-Divergence (MRmD) discretization scheme, where each attribute is discretized separately, by simultaneously maximizing the discriminant information and the generalization ability of the discretized data.
- Score: 22.079025650097932
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In many classification models, data is discretized to better estimate its
distribution. Existing discretization methods often target at maximizing the
discriminant power of discretized data, while overlooking the fact that the
primary target of data discretization in classification is to improve the
generalization performance. As a result, the data tend to be over-split into
many small bins since the data without discretization retain the maximal
discriminant information. Thus, we propose a Max-Dependency-Min-Divergence
(MDmD) criterion that maximizes both the discriminant information and
generalization ability of the discretized data. More specifically, the
Max-Dependency criterion maximizes the statistical dependency between the
discretized data and the classification variable while the Min-Divergence
criterion explicitly minimizes the JS-divergence between the training data and
the validation data for a given discretization scheme. The proposed MDmD
criterion is technically appealing, but it is difficult to reliably estimate
the high-order joint distributions of attributes and the classification
variable. We hence further propose a more practical solution,
Max-Relevance-Min-Divergence (MRmD) discretization scheme, where each attribute
is discretized separately, by simultaneously maximizing the discriminant
information and the generalization ability of the discretized data. The
proposed MRmD is compared with the state-of-the-art discretization algorithms
under the naive Bayes classification framework on 45 machine-learning benchmark
datasets. It significantly outperforms all the compared methods on most of the
datasets.
Related papers
- Synergistic eigenanalysis of covariance and Hessian matrices for enhanced binary classification [72.77513633290056]
We present a novel approach that combines the eigenanalysis of a covariance matrix evaluated on a training set with a Hessian matrix evaluated on a deep learning model.
Our method captures intricate patterns and relationships, enhancing classification performance.
arXiv Detail & Related papers (2024-02-14T16:10:42Z) - Anomaly Detection Under Uncertainty Using Distributionally Robust
Optimization Approach [0.9217021281095907]
Anomaly detection is defined as the problem of finding data points that do not follow the patterns of the majority.
The one-class Support Vector Machines (SVM) method aims to find a decision boundary to distinguish between normal data points and anomalies.
A distributionally robust chance-constrained model is proposed in which the probability of misclassification is low.
arXiv Detail & Related papers (2023-12-03T06:13:22Z) - Characterizing the Optimal 0-1 Loss for Multi-class Classification with
a Test-time Attacker [57.49330031751386]
We find achievable information-theoretic lower bounds on loss in the presence of a test-time attacker for multi-class classifiers on any discrete dataset.
We provide a general framework for finding the optimal 0-1 loss that revolves around the construction of a conflict hypergraph from the data and adversarial constraints.
arXiv Detail & Related papers (2023-02-21T15:17:13Z) - Dimension Independent Data Sets Approximation and Applications to
Classification [0.0]
We revisit the classical kernel method of approximation/interpolation theory in a very specific context.
Special functions, called data signals, are defined for any given data set and are used to succesfully solve supervised classification problems.
arXiv Detail & Related papers (2022-08-29T17:32:55Z) - A Semi-Supervised Adaptive Discriminative Discretization Method
Improving Discrimination Power of Regularized Naive Bayes [0.48342038441006785]
We propose a semi-supervised adaptive discriminative discretization framework for naive Bayes.
It could better estimate the data distribution by utilizing both labeled data and unlabeled data through pseudo-labeling techniques.
The proposed method also significantly reduces the information loss during discretization by utilizing an adaptive discriminative discretization scheme.
arXiv Detail & Related papers (2021-11-22T04:36:40Z) - Riemannian classification of EEG signals with missing values [67.90148548467762]
This paper proposes two strategies to handle missing data for the classification of electroencephalograms.
The first approach estimates the covariance from imputed data with the $k$-nearest neighbors algorithm; the second relies on the observed data by leveraging the observed-data likelihood within an expectation-maximization algorithm.
As results show, the proposed strategies perform better than the classification based on observed data and allow to keep a high accuracy even when the missing data ratio increases.
arXiv Detail & Related papers (2021-10-19T14:24:50Z) - The Power of Log-Sum-Exp: Sequential Density Ratio Matrix Estimation for
Speed-Accuracy Optimization [0.0]
We propose a model for multiclass classification of time series to make a prediction as early and as accurate as possible.
Our overall architecture for early classification, MSPRT-TANDEM, statistically significantly outperforms baseline models on four datasets.
arXiv Detail & Related papers (2021-05-28T07:21:58Z) - A Unified Joint Maximum Mean Discrepancy for Domain Adaptation [73.44809425486767]
This paper theoretically derives a unified form of JMMD that is easy to optimize.
From the revealed unified JMMD, we illustrate that JMMD degrades the feature-label dependence that benefits to classification.
We propose a novel MMD matrix to promote the dependence, and devise a novel label kernel that is robust to label distribution shift.
arXiv Detail & Related papers (2021-01-25T09:46:14Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - A Compressive Classification Framework for High-Dimensional Data [12.284934135116515]
We propose a compressive classification framework for settings where the data dimensionality is significantly higher than the sample size.
The proposed method, referred to as regularized discriminant analysis (CRDA), is based on linear discriminant analysis.
It has the ability to select significant features by using joint-sparsity promoting hard thresholding in the discriminant rule.
arXiv Detail & Related papers (2020-05-09T06:55:00Z) - Towards Discriminability and Diversity: Batch Nuclear-norm Maximization
under Label Insufficient Situations [154.51144248210338]
Batch Nuclear-norm Maximization (BNM) is proposed to boost the learning under label insufficient learning scenarios.
BNM outperforms competitors and works well with existing well-known methods.
arXiv Detail & Related papers (2020-03-27T05:04:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.