Random thoughts about Complexity, Data and Models
- URL: http://arxiv.org/abs/2005.04729v1
- Date: Thu, 16 Apr 2020 14:27:22 GMT
- Title: Random thoughts about Complexity, Data and Models
- Authors: Hykel Hosni and Angelo Vulpiani
- Abstract summary: Data Science and Machine learning have been growing strong for the past decade.
We investigate the subtle relation between "data and models"
Key issue for appraisal of relation between algorithmic complexity and algorithmic learning is to do with concepts of compressibility, determinism and predictability.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data Science and Machine learning have been growing strong for the past
decade. We argue that to make the most of this exciting field we should resist
the temptation of assuming that forecasting can be reduced to brute-force data
analytics. This owes to the fact that modelling, as we illustrate below,
requires mastering the art of selecting relevant variables. More specifically,
we investigate the subtle relation between "data and models" by focussing on
the role played by algorithmic complexity, which contributed to making
mathematically rigorous the long-standing idea that to understand empirical
phenomena is to describe the rules which generate the data in terms which are
"simpler" than the data itself.
A key issue for the appraisal of the relation between algorithmic complexity
and algorithmic learning is to do with a much needed clarification on the
related but distinct concepts of compressibility, determinism and
predictability. To this end we will illustrate that the evolution law of a
chaotic system is compressibile, but a generic initial condition for it is not,
making the time series generated by chaotic systems incompressible in general.
Hence knowledge of the rules which govern an empirical phenomenon are not
sufficient for predicting its outcomes. In turn this implies that there is more
to understanding phenomena than learning -- even from data alone -- such rules.
This can be achieved only in those cases when we are capable of "good
modelling".
Clearly, the very idea of algorithmic complexity rests on Turing's seminal
analysis of computation. This motivates our remarks on this extremely telling
example of analogy-based abstract modelling which is nonetheless heavily
informed by empirical facts.
Related papers
- Identifiable Causal Representation Learning: Unsupervised, Multi-View, and Multi-Environment [10.814585613336778]
Causal representation learning aims to combine the core strengths of machine learning and causality.
This thesis investigates what is possible for CRL without direct supervision, and thus contributes to its theoretical foundations.
arXiv Detail & Related papers (2024-06-19T09:14:40Z) - A Simple Generative Model of Logical Reasoning and Statistical Learning [0.6853165736531939]
Statistical learning and logical reasoning are two major fields of AI expected to be unified for human-like machine intelligence.
We here propose a simple Bayesian model of logical reasoning and statistical learning.
We simply model how data causes symbolic knowledge in terms of its satisfiability in formal logic.
arXiv Detail & Related papers (2023-05-18T16:34:51Z) - A simplicity bubble problem and zemblanity in digitally intermediated societies [1.4380443010065829]
We discuss the ubiquity of Big Data and machine learning in society.
We show that there is a ceiling above which formal knowledge cannot further decrease the probability of zemblanitous findings.
arXiv Detail & Related papers (2023-04-21T00:02:15Z) - The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning [80.1018596899899]
We argue that neural network models share this same preference, formalized using Kolmogorov complexity.
Our experiments show that pre-trained and even randomly language models prefer to generate low-complexity sequences.
These observations justify the trend in deep learning of unifying seemingly disparate problems with an increasingly small set of machine learning models.
arXiv Detail & Related papers (2023-04-11T17:22:22Z) - Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test.
We train a variational inference model to predict the causal structure from observational/interventional data.
Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z) - Principled Knowledge Extrapolation with GANs [92.62635018136476]
We study counterfactual synthesis from a new perspective of knowledge extrapolation.
We show that an adversarial game with a closed-form discriminator can be used to address the knowledge extrapolation problem.
Our method enjoys both elegant theoretical guarantees and superior performance in many scenarios.
arXiv Detail & Related papers (2022-05-21T08:39:42Z) - A Simplicity Bubble Problem in Formal-Theoretic Learning Systems [1.7996150751268578]
We show that current approaches to machine learning can always be deceived, naturally or artificially, by sufficiently large datasets.
We discuss the framework and additional empirical conditions to be met in order to circumvent this deceptive phenomenon.
arXiv Detail & Related papers (2021-12-22T23:44:47Z) - The Causal Neural Connection: Expressiveness, Learnability, and
Inference [125.57815987218756]
An object called structural causal model (SCM) represents a collection of mechanisms and sources of random variation of the system under investigation.
In this paper, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020) still holds for neural models.
We introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences.
arXiv Detail & Related papers (2021-07-02T01:55:18Z) - Parsimonious Inference [0.0]
Parsimonious inference is an information-theoretic formulation of inference over arbitrary architectures.
Our approaches combine efficient encodings with prudent sampling strategies to construct predictive ensembles without cross-validation.
arXiv Detail & Related papers (2021-03-03T04:13:14Z) - Causal Expectation-Maximisation [70.45873402967297]
We show that causal inference is NP-hard even in models characterised by polytree-shaped graphs.
We introduce the causal EM algorithm to reconstruct the uncertainty about the latent variables from data about categorical manifest variables.
We argue that there appears to be an unnoticed limitation to the trending idea that counterfactual bounds can often be computed without knowledge of the structural equations.
arXiv Detail & Related papers (2020-11-04T10:25:13Z) - Learning Causal Models Online [103.87959747047158]
Predictive models can rely on spurious correlations in the data for making predictions.
One solution for achieving strong generalization is to incorporate causal structures in the models.
We propose an online algorithm that continually detects and removes spurious features.
arXiv Detail & Related papers (2020-06-12T20:49:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.