The classification for High-dimension low-sample size data
- URL: http://arxiv.org/abs/2006.13018v3
- Date: Wed, 20 Jan 2021 22:37:14 GMT
- Title: The classification for High-dimension low-sample size data
- Authors: Liran Shen, Meng Joo Er, Qingbo Yin
- Abstract summary: We propose a novel classification criterion on HDLSS, tolerance, which emphasizes similarity of within-class variance on the premise of class separability.
According to this criterion, a novel linear binary classifier is designed, denoted by No-separated Data Dispersion Maximum (NPDMD)
NPDMD has several characteristics compared to the state-of-the-art classification methods.
- Score: 3.411873646414169
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Huge amount of applications in various fields, such as gene expression
analysis or computer vision, undergo data sets with high-dimensional
low-sample-size (HDLSS), which has putted forward great challenges for standard
statistical and modern machine learning methods. In this paper, we propose a
novel classification criterion on HDLSS, tolerance similarity, which emphasizes
the maximization of within-class variance on the premise of class separability.
According to this criterion, a novel linear binary classifier is designed,
denoted by No-separated Data Maximum Dispersion classifier (NPDMD). The
objective of NPDMD is to find a projecting direction w in which all of training
samples scatter in as large an interval as possible. NPDMD has several
characteristics compared to the state-of-the-art classification methods. First,
it works well on HDLSS. Second, it combines the sample statistical information
and local structural information (supporting vectors) into the objective
function to find the solution of projecting direction in the whole feature
spaces. Third, it solves the inverse of high dimensional matrix in low
dimensional space. Fourth, it is relatively simple to be implemented based on
Quadratic Programming. Fifth, it is robust to the model specification for
various real applications. The theoretical properties of NPDMD are deduced. We
conduct a series of evaluations on one simulated and six real-world benchmark
data sets, including face classification and mRNA classification. NPDMD
outperforms those widely used approaches in most cases, or at least obtains
comparable results.
Related papers
- Minimally Supervised Learning using Topological Projections in
Self-Organizing Maps [55.31182147885694]
We introduce a semi-supervised learning approach based on topological projections in self-organizing maps (SOMs)
Our proposed method first trains SOMs on unlabeled data and then a minimal number of available labeled data points are assigned to key best matching units (BMU)
Our results indicate that the proposed minimally supervised model significantly outperforms traditional regression techniques.
arXiv Detail & Related papers (2024-01-12T22:51:48Z) - Minimally Informed Linear Discriminant Analysis: training an LDA model
with unlabelled data [51.673443581397954]
We show that it is possible to compute the exact projection vector from LDA models based on unlabelled data.
We show that the MILDA projection vector can be computed in a closed form with a computational cost comparable to LDA.
arXiv Detail & Related papers (2023-10-17T09:50:31Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Optimal Discriminant Analysis in High-Dimensional Latent Factor Models [1.4213973379473654]
In high-dimensional classification problems, a commonly used approach is to first project the high-dimensional features into a lower dimensional space.
We formulate a latent-variable model with a hidden low-dimensional structure to justify this two-step procedure.
We propose a computationally efficient classifier that takes certain principal components (PCs) of the observed features as projections.
arXiv Detail & Related papers (2022-10-23T21:45:53Z) - The Deep Generative Decoder: MAP estimation of representations improves
modeling of single-cell RNA data [0.0]
We present a simple generative model that computes model parameters and representations directly via maximum a posteriori (MAP) estimation.
The advantages of this approach are its simplicity and its capability to provide representations of much smaller dimensionality than a comparable VAE.
arXiv Detail & Related papers (2021-10-13T12:17:46Z) - Manifold Topology Divergence: a Framework for Comparing Data Manifolds [109.0784952256104]
We develop a framework for comparing data manifold, aimed at the evaluation of deep generative models.
Based on the Cross-Barcode, we introduce the Manifold Topology Divergence score (MTop-Divergence)
We demonstrate that the MTop-Divergence accurately detects various degrees of mode-dropping, intra-mode collapse, mode invention, and image disturbance.
arXiv Detail & Related papers (2021-06-08T00:30:43Z) - Rank-R FNN: A Tensor-Based Learning Model for High-Order Data
Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters.
First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension.
We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z) - Population structure-learned classifier for high-dimension
low-sample-size class-imbalanced problem [3.411873646414169]
Population Structure-learned classifier (PSC) is proposed.
PSC can obtain better generalization performance on IHDLSS.
PSC is superior to the state-of-art methods in IHDLSS.
arXiv Detail & Related papers (2020-09-10T08:33:39Z) - High-Dimensional Quadratic Discriminant Analysis under Spiked Covariance
Model [101.74172837046382]
We propose a novel quadratic classification technique, the parameters of which are chosen such that the fisher-discriminant ratio is maximized.
Numerical simulations show that the proposed classifier not only outperforms the classical R-QDA for both synthetic and real data but also requires lower computational complexity.
arXiv Detail & Related papers (2020-06-25T12:00:26Z) - Unsupervised Discretization by Two-dimensional MDL-based Histogram [0.0]
Unsupervised discretization is a crucial step in many knowledge discovery tasks.
We propose an expressive model class that allows for far more flexible partitions of two-dimensional data.
We introduce a algorithm, named PALM, which Partitions each dimension ALternately and then Merges neighboring regions.
arXiv Detail & Related papers (2020-06-02T19:19:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.