Determination of class-specific variables in nonparametric
multiple-class classification
- URL: http://arxiv.org/abs/2205.03623v1
- Date: Sat, 7 May 2022 10:08:58 GMT
- Title: Determination of class-specific variables in nonparametric
multiple-class classification
- Authors: Wan-Ping Nicole Chen, Yuan-chin Ivan Chang
- Abstract summary: We propose a probability-based nonparametric multiple-class classification method, and integrate it with the ability of identifying high impact variables for individual class.
We report the properties of the proposed method, and use both synthesized and real data sets to illustrate its properties under different classification situations.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As technology advanced, collecting data via automatic collection devices
become popular, thus we commonly face data sets with lengthy variables,
especially when these data sets are collected without specific research goals
beforehand. It has been pointed out in the literature that the difficulty of
high-dimensional classification problems is intrinsically caused by too many
noise variables useless for reducing classification error, which offer less
benefits for decision-making, and increase complexity, and confusion in
model-interpretation. A good variable selection strategy is therefore a must
for using such kinds of data well; especially when we expect to use their
results for the succeeding applications/studies, where the model-interpretation
ability is essential. hus, the conventional classification measures, such as
accuracy, sensitivity, precision, cannot be the only performance tasks. In this
paper, we propose a probability-based nonparametric multiple-class
classification method, and integrate it with the ability of identifying high
impact variables for individual class such that we can have more information
about its classification rule and the character of each class as well. The
proposed method can have its prediction power approximately equal to that of
the Bayes rule, and still retains the ability of "model-interpretation." We
report the asymptotic properties of the proposed method, and use both
synthesized and real data sets to illustrate its properties under different
classification situations. We also separately discuss the variable
identification, and training sample size determination, and summarize those
procedures as algorithms such that users can easily implement them with
different computing languages.
Related papers
- Probabilistic Safety Regions Via Finite Families of Scalable Classifiers [2.431537995108158]
Supervised classification recognizes patterns in the data to separate classes of behaviours.
Canonical solutions contain misclassification errors that are intrinsic to the numerical approximating nature of machine learning.
We introduce the concept of probabilistic safety region to describe a subset of the input space in which the number of misclassified instances is probabilistically controlled.
arXiv Detail & Related papers (2023-09-08T22:40:19Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - Learning Debiased and Disentangled Representations for Semantic
Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation.
By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes.
Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z) - When in Doubt: Improving Classification Performance with Alternating
Normalization [57.39356691967766]
We introduce Classification with Alternating Normalization (CAN), a non-parametric post-processing step for classification.
CAN improves classification accuracy for challenging examples by re-adjusting their predicted class probability distribution.
We empirically demonstrate its effectiveness across a diverse set of classification tasks.
arXiv Detail & Related papers (2021-09-28T02:55:42Z) - Theoretical Insights Into Multiclass Classification: A High-dimensional
Asymptotic View [82.80085730891126]
We provide the first modernally precise analysis of linear multiclass classification.
Our analysis reveals that the classification accuracy is highly distribution-dependent.
The insights gained may pave the way for a precise understanding of other classification algorithms.
arXiv Detail & Related papers (2020-11-16T05:17:29Z) - Predicting Classification Accuracy When Adding New Unobserved Classes [8.325327265120283]
We study how a classifier's performance can be used to extrapolate its expected accuracy on a larger, unobserved set of classes.
We formulate a robust neural-network-based algorithm, "CleaneX", which learns to estimate the accuracy of such classifiers on arbitrarily large sets of classes.
arXiv Detail & Related papers (2020-10-28T14:37:25Z) - Signal classification using weighted orthogonal regression method [0.0]
This paper proposes a new classification method that exploits the intrinsic structure of each class through the corresponding Eigen components.
The proposed method involves the obtained Eigenvectors by SVD of data from each class to select the bases for each subspace.
It considers an efficient weighting for the decision-making criterion to discriminate two classes.
arXiv Detail & Related papers (2020-10-12T19:12:14Z) - Evaluating Nonlinear Decision Trees for Binary Classification Tasks with
Other Existing Methods [8.870380386952993]
Classification of datasets into two or more distinct classes is an important machine learning task.
Many methods are able to classify binary classification tasks with a very high accuracy on test data, but cannot provide any easily interpretable explanation.
We highlight and evaluate a recently proposed nonlinear decision tree approach with a number of commonly used classification methods on a number of datasets.
arXiv Detail & Related papers (2020-08-25T00:00:23Z) - High-Dimensional Quadratic Discriminant Analysis under Spiked Covariance
Model [101.74172837046382]
We propose a novel quadratic classification technique, the parameters of which are chosen such that the fisher-discriminant ratio is maximized.
Numerical simulations show that the proposed classifier not only outperforms the classical R-QDA for both synthetic and real data but also requires lower computational complexity.
arXiv Detail & Related papers (2020-06-25T12:00:26Z) - Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks.
We present a unifying view of randomized smoothing over arbitrary functions.
We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.