High-dimensional separability for one- and few-shot learning
- URL: http://arxiv.org/abs/2106.15416v1
- Date: Mon, 28 Jun 2021 14:58:14 GMT
- Title: High-dimensional separability for one- and few-shot learning
- Authors: Alexander N. Gorban, Bogdan Grechuk, Evgeny M. Mirkes, Sergey V.
Stasenko, Ivan Y. Tyukin
- Abstract summary: This work is driven by a practical question, corrections of Artificial Intelligence (AI) errors.
Special external devices, correctors, are developed. They should provide quick and non-iterative system fix without modification of a legacy AI system.
New multi-correctors of AI systems are presented and illustrated with examples of predicting errors and learning new classes of objects by a deep convolutional neural network.
- Score: 58.8599521537
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work is driven by a practical question, corrections of Artificial
Intelligence (AI) errors. Systematic re-training of a large AI system is hardly
possible. To solve this problem, special external devices, correctors, are
developed. They should provide quick and non-iterative system fix without
modification of a legacy AI system. A common universal part of the AI corrector
is a classifier that should separate undesired and erroneous behavior from
normal operation. Training of such classifiers is a grand challenge at the
heart of the one- and few-shot learning methods. Effectiveness of one- and
few-short methods is based on either significant dimensionality reductions or
the blessing of dimensionality effects. Stochastic separability is a blessing
of dimensionality phenomenon that allows one-and few-shot error correction: in
high-dimensional datasets under broad assumptions each point can be separated
from the rest of the set by simple and robust linear discriminant. The
hierarchical structure of data universe is introduced where each data cluster
has a granular internal structure, etc. New stochastic separation theorems for
the data distributions with fine-grained structure are formulated and proved.
Separation theorems in infinite-dimensional limits are proven under assumptions
of compact embedding of patterns into data space. New multi-correctors of AI
systems are presented and illustrated with examples of predicting errors and
learning new classes of objects by a deep convolutional neural network.
Related papers
- Identifiable Causal Representation Learning: Unsupervised, Multi-View, and Multi-Environment [10.814585613336778]
Causal representation learning aims to combine the core strengths of machine learning and causality.
This thesis investigates what is possible for CRL without direct supervision, and thus contributes to its theoretical foundations.
arXiv Detail & Related papers (2024-06-19T09:14:40Z) - MLAD: A Unified Model for Multi-system Log Anomaly Detection [35.68387377240593]
We propose MLAD, a novel anomaly detection model that incorporates semantic relational reasoning across multiple systems.
Specifically, we employ Sentence-bert to capture the similarities between log sequences and convert them into highly-dimensional learnable semantic vectors.
We revamp the formulas of the Attention layer to discern the significance of each keyword in the sequence and model the overall distribution of the multi-system dataset.
arXiv Detail & Related papers (2024-01-15T12:51:13Z) - The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning [80.1018596899899]
We argue that neural network models share this same preference, formalized using Kolmogorov complexity.
Our experiments show that pre-trained and even randomly language models prefer to generate low-complexity sequences.
These observations justify the trend in deep learning of unifying seemingly disparate problems with an increasingly small set of machine learning models.
arXiv Detail & Related papers (2023-04-11T17:22:22Z) - Interpretable Linear Dimensionality Reduction based on Bias-Variance
Analysis [45.3190496371625]
We propose a principled dimensionality reduction approach that maintains the interpretability of the resulting features.
In this way, all features are considered, the dimensionality is reduced and the interpretability is preserved.
arXiv Detail & Related papers (2023-03-26T14:30:38Z) - Who Should Predict? Exact Algorithms For Learning to Defer to Humans [40.22768241509553]
We show that prior approaches can fail to find a human-AI system with low misclassification error.
We give a mixed-integer-linear-programming (MILP) formulation that can optimally solve the problem in the linear setting.
We provide a novel surrogate loss function that is realizable-consistent and performs well empirically.
arXiv Detail & Related papers (2023-01-15T21:57:36Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - General stochastic separation theorems with optimal bounds [68.8204255655161]
Phenomenon of separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities.
Errors or clusters of errors can be separated from the rest of the data.
The ability to correct an AI system also opens up the possibility of an attack on it, and the high dimensionality induces vulnerabilities caused by the same separability.
arXiv Detail & Related papers (2020-10-11T13:12:41Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Eigendecomposition-Free Training of Deep Networks for Linear
Least-Square Problems [107.3868459697569]
We introduce an eigendecomposition-free approach to training a deep network.
We show that our approach is much more robust than explicit differentiation of the eigendecomposition.
Our method has better convergence properties and yields state-of-the-art results.
arXiv Detail & Related papers (2020-04-15T04:29:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.