Quantifying Relevance in Learning and Inference
- URL: http://arxiv.org/abs/2202.00339v1
- Date: Tue, 1 Feb 2022 11:16:04 GMT
- Title: Quantifying Relevance in Learning and Inference
- Authors: Matteo Marsili and Yasser Roudi
- Abstract summary: We review recent progress on understanding learning, based on the notion of "relevance"
These are ideal limits of samples and of machines, that contain the maximal amount of information about the unknown generative process.
Maximally informative samples are characterised by a power-law frequency distribution (statistical criticality) and optimal learning machines by an anomalously large susceptibility.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning is a distinctive feature of intelligent behaviour. High-throughput
experimental data and Big Data promise to open new windows on complex systems
such as cells, the brain or our societies. Yet, the puzzling success of
Artificial Intelligence and Machine Learning shows that we still have a poor
conceptual understanding of learning. These applications push statistical
inference into uncharted territories where data is high-dimensional and scarce,
and prior information on "true" models is scant if not totally absent. Here we
review recent progress on understanding learning, based on the notion of
"relevance". The relevance, as we define it here, quantifies the amount of
information that a dataset or the internal representation of a learning machine
contains on the generative model of the data. This allows us to define
maximally informative samples, on one hand, and optimal learning machines on
the other. These are ideal limits of samples and of machines, that contain the
maximal amount of information about the unknown generative process, at a given
resolution (or level of compression). Both ideal limits exhibit critical
features in the statistical sense: Maximally informative samples are
characterised by a power-law frequency distribution (statistical criticality)
and optimal learning machines by an anomalously large susceptibility. The
trade-off between resolution (i.e. compression) and relevance distinguishes the
regime of noisy representations from that of lossy compression. These are
separated by a special point characterised by Zipf's law statistics. This
identifies samples obeying Zipf's law as the most compressed loss-less
representations that are optimal in the sense of maximal relevance. Criticality
in optimal learning machines manifests in an exponential degeneracy of energy
levels, that leads to unusual thermodynamic properties.
Related papers
- Learning Regularities from Data using Spiking Functions: A Theory [1.3735277588793995]
We propose a new machine learning theory, which defines in mathematics what are regularities.
We say that the discovered non-randomness is encoded into regularities if the function is simple enough.
In this process, we claim that the 'best' regularities, or the optimal spiking functions, are those who can capture the largest amount of information.
arXiv Detail & Related papers (2024-05-19T22:04:11Z) - TexShape: Information Theoretic Sentence Embedding for Language Models [5.265661844206274]
This paper addresses challenges regarding encoding sentences to their optimized representations through the lens of information-theory.
We use empirical estimates of mutual information, using the Donsker-Varadhan definition of Kullback-Leibler divergence.
Our experiments demonstrate significant advancements in preserving maximal targeted information and minimal sensitive information over adverse compression ratios.
arXiv Detail & Related papers (2024-02-05T22:48:28Z) - On Inductive Biases for Machine Learning in Data Constrained Settings [0.0]
This thesis explores a different answer to the problem of learning expressive models in data constrained settings.
Instead of relying on big datasets to learn neural networks, we will replace some modules by known functions reflecting the structure of the data.
Our approach falls under the hood of "inductive biases", which can be defined as hypothesis on the data at hand restricting the space of models to explore.
arXiv Detail & Related papers (2023-02-21T14:22:01Z) - Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data.
Main aim of the identified model is to predict new data from previous observations.
We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z) - Mutual Information Learned Classifiers: an Information-theoretic
Viewpoint of Training Deep Learning Classification Systems [9.660129425150926]
We show that the existing cross entropy loss minimization problem essentially learns the label conditional entropy of the underlying data distribution.
We propose a mutual information learning framework where we train deep neural network classifiers via learning the mutual information between the label and the input.
arXiv Detail & Related papers (2022-09-21T01:06:30Z) - Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models.
The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z) - A Survey of Learning on Small Data: Generalization, Optimization, and
Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI.
This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data.
Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z) - An Extension to Basis-Hypervectors for Learning from Circular Data in
Hyperdimensional Computing [62.997667081978825]
Hyperdimensional Computing (HDC) is a computation framework based on properties of high-dimensional random spaces.
We present a study on basis-hypervector sets, which leads to practical contributions to HDC in general.
We introduce a method to learn from circular data, an important type of information never before addressed in machine learning with HDC.
arXiv Detail & Related papers (2022-05-16T18:04:55Z) - Compressed Predictive Information Coding [6.220929746808418]
We develop a novel information-theoretic framework, Compressed Predictive Information Coding (CPIC), to extract useful representations from dynamic data.
We derive variational bounds of the CPIC loss which induces the latent space to capture information that is maximally predictive.
We demonstrate that CPIC is able to recover the latent space of noisy dynamical systems with low signal-to-noise ratios.
arXiv Detail & Related papers (2022-03-03T22:47:58Z) - When is Memorization of Irrelevant Training Data Necessary for
High-Accuracy Learning? [53.523017945443115]
We describe natural prediction problems in which every sufficiently accurate training algorithm must encode, in the prediction model, essentially all the information about a large subset of its training examples.
Our results do not depend on the training algorithm or the class of models used for learning.
arXiv Detail & Related papers (2020-12-11T15:25:14Z) - A Theory of Usable Information Under Computational Constraints [103.5901638681034]
We propose a new framework for reasoning about information in complex systems.
Our foundation is based on a variational extension of Shannon's information theory.
We show that by incorporating computational constraints, $mathcalV$-information can be reliably estimated from data.
arXiv Detail & Related papers (2020-02-25T06:09:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.