A Compressive Classification Framework for High-Dimensional Data
- URL: http://arxiv.org/abs/2005.04383v2
- Date: Thu, 12 Nov 2020 14:14:02 GMT
- Title: A Compressive Classification Framework for High-Dimensional Data
- Authors: Muhammad Naveed Tabassum and Esa Ollila
- Abstract summary: We propose a compressive classification framework for settings where the data dimensionality is significantly higher than the sample size.
The proposed method, referred to as regularized discriminant analysis (CRDA), is based on linear discriminant analysis.
It has the ability to select significant features by using joint-sparsity promoting hard thresholding in the discriminant rule.
- Score: 12.284934135116515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a compressive classification framework for settings where the data
dimensionality is significantly higher than the sample size. The proposed
method, referred to as compressive regularized discriminant analysis (CRDA) is
based on linear discriminant analysis and has the ability to select significant
features by using joint-sparsity promoting hard thresholding in the
discriminant rule. Since the number of features is larger than the sample size,
the method also uses state-of-the-art regularized sample covariance matrix
estimators. Several analysis examples on real data sets, including image,
speech signal and gene expression data illustrate the promising improvements
offered by the proposed CRDA classifier in practise. Overall, the proposed
method gives fewer misclassification errors than its competitors, while at the
same time achieving accurate feature selection results. The open-source R
package and MATLAB toolbox of the proposed method (named compressiveRDA) is
freely available.
Related papers
- Adaptively Robust and Sparse K-means Clustering [5.535948428518607]
This paper proposes adaptively robust and sparse K-means clustering (ARSK) to address these practical limitations of the standard K-means algorithm.
For robustness, we introduce a redundant error component for each observation, and this additional parameter is penalized using a group sparse penalty.
To accommodate the impact of high-dimensional noisy variables, the objective function is modified by incorporating weights and implementing a penalty to control the sparsity of the weight vector.
arXiv Detail & Related papers (2024-07-09T15:20:41Z) - Regularized Linear Discriminant Analysis Using a Nonlinear Covariance
Matrix Estimator [11.887333567383239]
Linear discriminant analysis (LDA) is a widely used technique for data classification.
LDA becomes inefficient when the data covariance matrix is ill-conditioned.
Regularized LDA methods have been proposed to cope with such a situation.
arXiv Detail & Related papers (2024-01-31T11:37:14Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - High-Dimensional Quadratic Discriminant Analysis under Spiked Covariance
Model [101.74172837046382]
We propose a novel quadratic classification technique, the parameters of which are chosen such that the fisher-discriminant ratio is maximized.
Numerical simulations show that the proposed classifier not only outperforms the classical R-QDA for both synthetic and real data but also requires lower computational complexity.
arXiv Detail & Related papers (2020-06-25T12:00:26Z) - Robust Locality-Aware Regression for Labeled Data Classification [5.432221650286726]
We propose a new discriminant feature extraction framework, namely Robust Locality-Aware Regression (RLAR)
In our model, we introduce a retargeted regression to perform the marginal representation learning adaptively instead of using the general average inter-class margin.
To alleviate the disturbance of outliers and prevent overfitting, we measure the regression term and locality-aware term together with the regularization term by the L2,1 norm.
arXiv Detail & Related papers (2020-06-15T11:36:59Z) - Compressing Large Sample Data for Discriminant Analysis [78.12073412066698]
We consider the computational issues due to large sample size within the discriminant analysis framework.
We propose a new compression approach for reducing the number of training samples for linear and quadratic discriminant analysis.
arXiv Detail & Related papers (2020-05-08T05:09:08Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z) - Saliency-based Weighted Multi-label Linear Discriminant Analysis [101.12909759844946]
We propose a new variant of Linear Discriminant Analysis (LDA) to solve multi-label classification tasks.
The proposed method is based on a probabilistic model for defining the weights of individual samples.
The Saliency-based weighted Multi-label LDA approach is shown to lead to performance improvements in various multi-label classification problems.
arXiv Detail & Related papers (2020-04-08T19:40:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.