Population structure-learned classifier for high-dimension
low-sample-size class-imbalanced problem
- URL: http://arxiv.org/abs/2009.04722v1
- Date: Thu, 10 Sep 2020 08:33:39 GMT
- Title: Population structure-learned classifier for high-dimension
low-sample-size class-imbalanced problem
- Authors: Liran Shen, Meng Joo Er, Qingbo Yin
- Abstract summary: Population Structure-learned classifier (PSC) is proposed.
PSC can obtain better generalization performance on IHDLSS.
PSC is superior to the state-of-art methods in IHDLSS.
- Score: 3.411873646414169
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Classification on high-dimension low-sample-size data (HDLSS) is a
challenging problem and it is common to have class-imbalanced data in most
application fields. We term this as Imbalanced HDLSS (IHDLSS). Recent
theoretical results reveal that the classification criterion and tolerance
similarity are crucial to HDLSS, which emphasizes the maximization of
within-class variance on the premise of class separability. Based on this idea,
a novel linear binary classifier, termed Population Structure-learned
Classifier (PSC), is proposed. The proposed PSC can obtain better
generalization performance on IHDLSS by maximizing the sum of inter-class
scatter matrix and intra-class scatter matrix on the premise of class
separability and assigning different intercept values to majority and minority
classes. The salient features of the proposed approach are: (1) It works well
on IHDLSS; (2) The inverse of high dimensional matrix can be solved in low
dimensional space; (3) It is self-adaptive in determining the intercept term
for each class; (4) It has the same computational complexity as the SVM. A
series of evaluations are conducted on one simulated data set and eight
real-world benchmark data sets on IHDLSS on gene analysis. Experimental results
demonstrate that the PSC is superior to the state-of-art methods in IHDLSS.
Related papers
- Synergistic eigenanalysis of covariance and Hessian matrices for enhanced binary classification [72.77513633290056]
We present a novel approach that combines the eigenanalysis of a covariance matrix evaluated on a training set with a Hessian matrix evaluated on a deep learning model.
Our method captures intricate patterns and relationships, enhancing classification performance.
arXiv Detail & Related papers (2024-02-14T16:10:42Z) - Overlapping oriented imbalanced ensemble learning method based on
projective clustering and stagewise hybrid sampling [22.32930261633615]
This paper proposes an ensemble learning algorithm based on dual clustering and stage-wise hybrid sampling (DCSHS)
The major advantage of our algorithm is that it can exploit the intersectionality of the CCS to realize the soft elimination of overlapping majority samples.
arXiv Detail & Related papers (2022-11-30T01:49:06Z) - Handling Imbalanced Classification Problems With Support Vector Machines
via Evolutionary Bilevel Optimization [73.17488635491262]
Support vector machines (SVMs) are popular learning algorithms to deal with binary classification problems.
This article introduces EBCS-SVM: evolutionary bilevel cost-sensitive SVMs.
arXiv Detail & Related papers (2022-04-21T16:08:44Z) - Self-Supervised Class Incremental Learning [51.62542103481908]
Existing Class Incremental Learning (CIL) methods are based on a supervised classification framework sensitive to data labels.
When updating them based on the new class data, they suffer from catastrophic forgetting: the model cannot discern old class data clearly from the new.
In this paper, we explore the performance of Self-Supervised representation learning in Class Incremental Learning (SSCIL) for the first time.
arXiv Detail & Related papers (2021-11-18T06:58:19Z) - Divide-and-Conquer Hard-thresholding Rules in High-dimensional
Imbalanced Classification [1.0312968200748118]
We study the impact of imbalance class sizes on the linear discriminant analysis (LDA) in high dimensions.
We show that due to data scarcity in one class, referred to as the minority class, the LDA ignores the minority class yielding a maximum misclassification rate.
We propose a new construction of a hard-conquering rule based on a divide-and-conquer technique that reduces the large difference between the misclassification rates.
arXiv Detail & Related papers (2021-11-05T07:44:28Z) - Statistical Theory for Imbalanced Binary Classification [8.93993657323783]
We show that optimal classification performance depends on certain properties of class imbalance that have not previously been formalized.
Specifically, we propose a novel sub-type of class imbalance, which we call Uniform Class Imbalance.
These results provide some of the first meaningful finite-sample statistical theory for imbalanced binary classification.
arXiv Detail & Related papers (2021-07-05T03:55:43Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z) - Self-Weighted Robust LDA for Multiclass Classification with Edge Classes [111.5515086563592]
A novel self-weighted robust LDA with l21-norm based between-class distance criterion, called SWRLDA, is proposed for multi-class classification.
The proposed SWRLDA is easy to implement, and converges fast in practice.
arXiv Detail & Related papers (2020-09-24T12:32:55Z) - High-Dimensional Quadratic Discriminant Analysis under Spiked Covariance
Model [101.74172837046382]
We propose a novel quadratic classification technique, the parameters of which are chosen such that the fisher-discriminant ratio is maximized.
Numerical simulations show that the proposed classifier not only outperforms the classical R-QDA for both synthetic and real data but also requires lower computational complexity.
arXiv Detail & Related papers (2020-06-25T12:00:26Z) - The classification for High-dimension low-sample size data [3.411873646414169]
We propose a novel classification criterion on HDLSS, tolerance, which emphasizes similarity of within-class variance on the premise of class separability.
According to this criterion, a novel linear binary classifier is designed, denoted by No-separated Data Dispersion Maximum (NPDMD)
NPDMD has several characteristics compared to the state-of-the-art classification methods.
arXiv Detail & Related papers (2020-06-21T07:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.