Unobserved classes and extra variables in high-dimensional discriminant
analysis
- URL: http://arxiv.org/abs/2102.01982v1
- Date: Wed, 3 Feb 2021 10:01:52 GMT
- Title: Unobserved classes and extra variables in high-dimensional discriminant
analysis
- Authors: Michael Fop, Pierre-Alexandre Mattei, Charles Bouveyron, Thomas
Brendan Murphy
- Abstract summary: In supervised classification problems, the test set may contain data points belonging to classes not observed in the learning phase.
We introduce a model-based discriminant approach, Dimension-Adaptive Mixture Discriminant Analysis (D-AMDA)
It can detect unobserved classes and adapt to the increasing dimensionality.
- Score: 9.467899386491204
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In supervised classification problems, the test set may contain data points
belonging to classes not observed in the learning phase. Moreover, the same
units in the test data may be measured on a set of additional variables
recorded at a subsequent stage with respect to when the learning sample was
collected. In this situation, the classifier built in the learning phase needs
to adapt to handle potential unknown classes and the extra dimensions. We
introduce a model-based discriminant approach, Dimension-Adaptive Mixture
Discriminant Analysis (D-AMDA), which can detect unobserved classes and adapt
to the increasing dimensionality. Model estimation is carried out via a full
inductive approach based on an EM algorithm. The method is then embedded in a
more general framework for adaptive variable selection and classification
suitable for data of large dimensions. A simulation study and an artificial
experiment related to classification of adulterated honey samples are used to
validate the ability of the proposed framework to deal with complex situations.
Related papers
- Deep Subspace Learning for Surface Anomaly Classification Based on 3D Point Cloud Data [2.5524809198548137]
This paper proposes a novel deep subspace learning-based 3D anomaly classification model.
Specifically, we model each class as a subspace to account for the intra-class variation, while promoting distinct subspaces of different classes to tackle the inter-class similarity.
Our method achieves better anomaly classification results than benchmark methods, and can effectively identify the new types of anomalies.
arXiv Detail & Related papers (2025-02-17T10:57:53Z) - Transfer learning via Regularized Linear Discriminant Analysis [2.321323878201932]
We present novel transfer learning methods via regularized random-effects linear discriminant analysis.
We derive the values of these weights and the associated classification error rates in the high-dimensional setting.
arXiv Detail & Related papers (2025-01-05T01:25:37Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - A Learning Based Hypothesis Test for Harmful Covariate Shift [3.1406146587437904]
Machine learning systems in high-risk domains need to identify when predictions should not be made on out-of-distribution test examples.
In this work, we use the discordance between an ensemble of classifiers trained to agree on training data and disagree on test data to determine when a model should be removed from the deployment setting.
arXiv Detail & Related papers (2022-12-06T04:15:24Z) - Parametric Classification for Generalized Category Discovery: A Baseline
Study [70.73212959385387]
Generalized Category Discovery (GCD) aims to discover novel categories in unlabelled datasets using knowledge learned from labelled samples.
We investigate the failure of parametric classifiers, verify the effectiveness of previous design choices when high-quality supervision is available, and identify unreliable pseudo-labels as a key problem.
We propose a simple yet effective parametric classification method that benefits from entropy regularisation, achieves state-of-the-art performance on multiple GCD benchmarks and shows strong robustness to unknown class numbers.
arXiv Detail & Related papers (2022-11-21T18:47:11Z) - Determination of class-specific variables in nonparametric
multiple-class classification [0.0]
We propose a probability-based nonparametric multiple-class classification method, and integrate it with the ability of identifying high impact variables for individual class.
We report the properties of the proposed method, and use both synthesized and real data sets to illustrate its properties under different classification situations.
arXiv Detail & Related papers (2022-05-07T10:08:58Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z) - Learning from Aggregate Observations [82.44304647051243]
We study the problem of learning from aggregate observations where supervision signals are given to sets of instances.
We present a general probabilistic framework that accommodates a variety of aggregate observations.
Simple maximum likelihood solutions can be applied to various differentiable models.
arXiv Detail & Related papers (2020-04-14T06:18:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.