Robust classification with flexible discriminant analysis in
heterogeneous data
- URL: http://arxiv.org/abs/2201.02967v1
- Date: Sun, 9 Jan 2022 09:22:56 GMT
- Title: Robust classification with flexible discriminant analysis in
heterogeneous data
- Authors: Pierre Houdouin, Fr\'ed\'eric Pascal, Matthieu Jonckheere, Andrew Wang
- Abstract summary: This paper presents a new robust discriminant analysis where each data point is drawn by its own arbitrary scale parameter.
It is shown that maximum-likelihood parameter estimation and classification are very simple, fast and robust compared to state-of-the-art methods.
- Score: 0.7646713951724009
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Linear and Quadratic Discriminant Analysis are well-known classical methods
but can heavily suffer from non-Gaussian distributions and/or contaminated
datasets, mainly because of the underlying Gaussian assumption that is not
robust. To fill this gap, this paper presents a new robust discriminant
analysis where each data point is drawn by its own arbitrary Elliptically
Symmetrical (ES) distribution and its own arbitrary scale parameter. Such a
model allows for possibly very heterogeneous, independent but non-identically
distributed samples. After deriving a new decision rule, it is shown that
maximum-likelihood parameter estimation and classification are very simple,
fast and robust compared to state-of-the-art methods.
Related papers
- Anomaly Detection Under Uncertainty Using Distributionally Robust
Optimization Approach [0.9217021281095907]
Anomaly detection is defined as the problem of finding data points that do not follow the patterns of the majority.
The one-class Support Vector Machines (SVM) method aims to find a decision boundary to distinguish between normal data points and anomalies.
A distributionally robust chance-constrained model is proposed in which the probability of misclassification is low.
arXiv Detail & Related papers (2023-12-03T06:13:22Z) - FEMDA: a unified framework for discriminant analysis [4.6040036610482655]
We present a novel approach to deal with non-Gaussian datasets.
The model considered is an arbitraryly Symmetrical (ES) distribution per cluster with its own arbitrary scale parameter.
By deriving a new decision rule, we demonstrate that maximum-likelihood parameter estimation and classification are simple, efficient, and robust compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-11-13T17:59:37Z) - FEMDA: Une m\'ethode de classification robuste et flexible [0.8594140167290096]
This paper studies robustness to scale changes in the data of a new discriminant analysis technique.
The new decision rule derived is simple, fast, and robust to scale changes in the data compared to other state-of-the-art method.
arXiv Detail & Related papers (2023-07-04T23:15:31Z) - Data thinning for convolution-closed distributions [2.299914829977005]
We propose data thinning, an approach for splitting an observation into two or more independent parts that sum to the original observation.
We show that data thinning can be used to validate the results of unsupervised learning approaches.
arXiv Detail & Related papers (2023-01-18T02:47:41Z) - Predicting Out-of-Domain Generalization with Neighborhood Invariance [59.05399533508682]
We propose a measure of a classifier's output invariance in a local transformation neighborhood.
Our measure is simple to calculate, does not depend on the test point's true label, and can be applied even in out-of-domain (OOD) settings.
In experiments on benchmarks in image classification, sentiment analysis, and natural language inference, we demonstrate a strong and robust correlation between our measure and actual OOD generalization.
arXiv Detail & Related papers (2022-07-05T14:55:16Z) - Distributed Sparse Multicategory Discriminant Analysis [1.7223564681760166]
This paper proposes a convex formulation for sparse multicategory linear discriminant analysis and then extend it to the distributed setting when data are stored across multiple sites.
Theoretically, we establish statistical properties ensuring that the distributed sparse multicategory linear discriminant analysis performs as good as the centralized version after a few rounds of communications.
arXiv Detail & Related papers (2022-02-22T14:23:33Z) - A Robust and Flexible EM Algorithm for Mixtures of Elliptical
Distributions with Missing Data [71.9573352891936]
This paper tackles the problem of missing data imputation for noisy and non-Gaussian data.
A new EM algorithm is investigated for mixtures of elliptical distributions with the property of handling potential missing data.
Experimental results on synthetic data demonstrate that the proposed algorithm is robust to outliers and can be used with non-Gaussian data.
arXiv Detail & Related papers (2022-01-28T10:01:37Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Identification of Probability weighted ARX models with arbitrary domains [75.91002178647165]
PieceWise Affine models guarantees universal approximation, local linearity and equivalence to other classes of hybrid system.
In this work, we focus on the identification of PieceWise Auto Regressive with eXogenous input models with arbitrary regions (NPWARX)
The architecture is conceived following the Mixture of Expert concept, developed within the machine learning field.
arXiv Detail & Related papers (2020-09-29T12:50:33Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.