General stochastic separation theorems with optimal bounds
- URL: http://arxiv.org/abs/2010.05241v2
- Date: Sat, 9 Jan 2021 19:10:17 GMT
- Title: General stochastic separation theorems with optimal bounds
- Authors: Bogdan Grechuk, Alexander N. Gorban, Ivan Y. Tyukin
- Abstract summary: Phenomenon of separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities.
Errors or clusters of errors can be separated from the rest of the data.
The ability to correct an AI system also opens up the possibility of an attack on it, and the high dimensionality induces vulnerabilities caused by the same separability.
- Score: 68.8204255655161
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Phenomenon of stochastic separability was revealed and used in machine
learning to correct errors of Artificial Intelligence (AI) systems and analyze
AI instabilities. In high-dimensional datasets under broad assumptions each
point can be separated from the rest of the set by simple and robust Fisher's
discriminant (is Fisher separable). Errors or clusters of errors can be
separated from the rest of the data. The ability to correct an AI system also
opens up the possibility of an attack on it, and the high dimensionality
induces vulnerabilities caused by the same stochastic separability that holds
the keys to understanding the fundamentals of robustness and adaptivity in
high-dimensional data-driven AI. To manage errors and analyze vulnerabilities,
the stochastic separation theorems should evaluate the probability that the
dataset will be Fisher separable in given dimensionality and for a given class
of distributions. Explicit and optimal estimates of these separation
probabilities are required, and this problem is solved in present work. The
general stochastic separation theorems with optimal probability estimates are
obtained for important classes of distributions: log-concave distribution,
their convex combinations and product distributions. The standard i.i.d.
assumption was significantly relaxed. These theorems and estimates can be used
both for correction of high-dimensional data driven AI systems and for analysis
of their vulnerabilities. The third area of application is the emergence of
memories in ensembles of neurons, the phenomena of grandmother's cells and
sparse coding in the brain, and explanation of unexpected effectiveness of
small neural ensembles in high-dimensional brain.
Related papers
- Characteristic Circuits [26.223089423713486]
Probabilistic circuits (PCs) compose simple, tractable distributions into a high-dimensional probability distribution.
We introduce characteristic circuits (CCs) providing a unified formalization of distributions over heterogeneous data in the spectral domain.
We show that CCs outperform state-of-the-art density estimators for heterogeneous data domains on common benchmark data sets.
arXiv Detail & Related papers (2023-12-12T23:15:07Z) - Skew Probabilistic Neural Networks for Learning from Imbalanced Data [3.7892198600060945]
This paper introduces an imbalanced data-oriented approach using probabilistic neural networks (PNNs) with a skew normal probability kernel.
We show that SkewPNNs substantially outperform state-of-the-art machine learning methods for both balanced and imbalanced datasets in most experimental settings.
arXiv Detail & Related papers (2023-12-10T13:12:55Z) - Distributed Variational Inference for Online Supervised Learning [15.038649101409804]
This paper develops a scalable distributed probabilistic inference algorithm.
It applies to continuous variables, intractable posteriors and large-scale real-time data in sensor networks.
arXiv Detail & Related papers (2023-09-05T22:33:02Z) - Structured Radial Basis Function Network: Modelling Diversity for
Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions.
A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems.
It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z) - Learning Linear Causal Representations from Interventions under General
Nonlinear Mixing [52.66151568785088]
We prove strong identifiability results given unknown single-node interventions without access to the intervention targets.
This is the first instance of causal identifiability from non-paired interventions for deep neural network embeddings.
arXiv Detail & Related papers (2023-06-04T02:32:12Z) - High-dimensional separability for one- and few-shot learning [58.8599521537]
This work is driven by a practical question, corrections of Artificial Intelligence (AI) errors.
Special external devices, correctors, are developed. They should provide quick and non-iterative system fix without modification of a legacy AI system.
New multi-correctors of AI systems are presented and illustrated with examples of predicting errors and learning new classes of objects by a deep convolutional neural network.
arXiv Detail & Related papers (2021-06-28T14:58:14Z) - Improving Uncertainty Calibration via Prior Augmented Data [56.88185136509654]
Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators.
They are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions.
We propose a solution by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels.
arXiv Detail & Related papers (2021-02-22T07:02:37Z) - Unifying supervised learning and VAEs -- coverage, systematics and
goodness-of-fit in normalizing-flow based neural network models for
astro-particle reconstructions [0.0]
Statistical uncertainties, coverage, systematic uncertainties or a goodness-of-fit measure are often not calculated.
We show that a KL-divergence objective of the joint distribution of data and labels allows to unify supervised learning and variational autoencoders.
We discuss how to calculate coverage probabilities without numerical integration for specific "base-ordered" contours.
arXiv Detail & Related papers (2020-08-13T11:28:57Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.