Robust learning of data anomalies with analytically-solvable entropic
outlier sparsification
- URL: http://arxiv.org/abs/2112.11768v1
- Date: Wed, 22 Dec 2021 10:13:29 GMT
- Title: Robust learning of data anomalies with analytically-solvable entropic
outlier sparsification
- Authors: Illia Horenko
- Abstract summary: Outlier Sparsification (EOS) is proposed as a robust computational strategy for the detection of data anomalies.
The performance of EOS is compared to a range of commonly-used tools on synthetic problems and on partially-mislabeled supervised classification problems from biomedicine.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Entropic Outlier Sparsification (EOS) is proposed as a robust computational
strategy for the detection of data anomalies in a broad class of learning
methods, including the unsupervised problems (like detection of non-Gaussian
outliers in mostly-Gaussian data) and in the supervised learning with
mislabeled data. EOS dwells on the derived analytic closed-form solution of the
(weighted) expected error minimization problem subject to the Shannon entropy
regularization. In contrast to common regularization strategies requiring
computational costs that scale polynomial with the data dimension, identified
closed-form solution is proven to impose additional iteration costs that depend
linearly on statistics size and are independent of data dimension. Obtained
analytic results also explain why the mixtures of spherically-symmetric
Gaussians - used heuristically in many popular data analysis algorithms -
represent an optimal choice for the non-parametric probability distributions
when working with squared Euclidean distances, combining expected error
minimality, maximal entropy/unbiasedness, and a linear cost scaling. The
performance of EOS is compared to a range of commonly-used tools on synthetic
problems and on partially-mislabeled supervised classification problems from
biomedicine.
Related papers
- Generalization Analysis of Machine Learning Algorithms via the
Worst-Case Data-Generating Probability Measure [1.773764539873123]
Worst-case probability measure over the data is introduced as a tool for characterizing the generalization capabilities of machine learning algorithms.
Fundamental generalization metrics, such as the sensitivity of the expected loss, the sensitivity of empirical risk, and the generalization gap are shown to have closed-form expressions.
A novel parallel is established between the worst-case data-generating probability measure and the Gibbs algorithm.
arXiv Detail & Related papers (2023-12-19T15:20:27Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - A Robust and Flexible EM Algorithm for Mixtures of Elliptical
Distributions with Missing Data [71.9573352891936]
This paper tackles the problem of missing data imputation for noisy and non-Gaussian data.
A new EM algorithm is investigated for mixtures of elliptical distributions with the property of handling potential missing data.
Experimental results on synthetic data demonstrate that the proposed algorithm is robust to outliers and can be used with non-Gaussian data.
arXiv Detail & Related papers (2022-01-28T10:01:37Z) - Low-rank statistical finite elements for scalable model-data synthesis [0.8602553195689513]
statFEM acknowledges a priori model misspecification, by embedding forcing within the governing equations.
The method reconstructs the observed data-generating processes with minimal loss of information.
This article overcomes this hurdle by embedding a low-rank approximation of the underlying dense covariance matrix.
arXiv Detail & Related papers (2021-09-10T09:51:43Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Stochastic Approximation for Online Tensorial Independent Component
Analysis [98.34292831923335]
Independent component analysis (ICA) has been a popular dimension reduction tool in statistical machine learning and signal processing.
In this paper, we present a by-product online tensorial algorithm that estimates for each independent component.
arXiv Detail & Related papers (2020-12-28T18:52:37Z) - General stochastic separation theorems with optimal bounds [68.8204255655161]
Phenomenon of separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities.
Errors or clusters of errors can be separated from the rest of the data.
The ability to correct an AI system also opens up the possibility of an attack on it, and the high dimensionality induces vulnerabilities caused by the same separability.
arXiv Detail & Related papers (2020-10-11T13:12:41Z) - Information Theory Measures via Multidimensional Gaussianization [7.788961560607993]
Information theory is an outstanding framework to measure uncertainty, dependence and relevance in data and systems.
It has several desirable properties for real world applications.
However, obtaining information from multidimensional data is a challenging problem due to the curse of dimensionality.
arXiv Detail & Related papers (2020-10-08T07:22:16Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.