A Simplicity Bubble Problem in Formal-Theoretic Learning Systems
- URL: http://arxiv.org/abs/2112.12275v2
- Date: Tue, 25 Apr 2023 15:14:07 GMT
- Title: A Simplicity Bubble Problem in Formal-Theoretic Learning Systems
- Authors: Felipe S. Abrah\~ao, Hector Zenil, Fabio Porto, Michael Winter, Klaus
Wehmuth, Itala M. L. D'Ottaviano
- Abstract summary: We show that current approaches to machine learning can always be deceived, naturally or artificially, by sufficiently large datasets.
We discuss the framework and additional empirical conditions to be met in order to circumvent this deceptive phenomenon.
- Score: 1.7996150751268578
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When mining large datasets in order to predict new data, limitations of the
principles behind statistical machine learning pose a serious challenge not
only to the Big Data deluge, but also to the traditional assumptions that data
generating processes are biased toward low algorithmic complexity. Even when
one assumes an underlying algorithmic-informational bias toward simplicity in
finite dataset generators, we show that current approaches to machine learning
(including deep learning, or any formal-theoretic hybrid mix of top-down AI and
statistical machine learning approaches), can always be deceived, naturally or
artificially, by sufficiently large datasets. In particular, we demonstrate
that, for every learning algorithm (with or without access to a formal theory),
there is a sufficiently large dataset size above which the algorithmic
probability of an unpredictable deceiver is an upper bound (up to a
multiplicative constant that only depends on the learning algorithm) for the
algorithmic probability of any other larger dataset. In other words, very large
and complex datasets can deceive learning algorithms into a ``simplicity
bubble'' as likely as any other particular non-deceiving dataset. These
deceiving datasets guarantee that any prediction effected by the learning
algorithm will unpredictably diverge from the high-algorithmic-complexity
globally optimal solution while converging toward the
low-algorithmic-complexity locally optimal solution, although the latter is
deemed a global one by the learning algorithm. We discuss the framework and
additional empirical conditions to be met in order to circumvent this deceptive
phenomenon, moving away from statistical machine learning towards a stronger
type of machine learning based on, and motivated by, the intrinsic power of
algorithmic information theory and computability theory.
Related papers
- Learning-Based TSP-Solvers Tend to Be Overly Greedy [8.79364699260219]
This study constructs a statistical measure called nearest-neighbor density to verify the properties of randomly generated instance of learning-based solvers.
We validate that the performance of the learning-based solvers degenerates much on such augmented data.
In short, we decipher the limitations of learning-based TSP solvers tending to be overly greedy, which may have profound implications for AI-empowered optimization solvers.
arXiv Detail & Related papers (2025-02-02T12:06:13Z) - Weak-to-Strong Generalization Through the Data-Centric Lens [12.221894353699918]
We propose a simple data-centric mechanism that characterizes weak-to-strong generalization: the overlap density.
We present a theoretical result showing that the generalization benefit is a function of the overlap density and a regret bound for our data selection algorithm.
arXiv Detail & Related papers (2024-12-05T05:29:19Z) - Representation Learning with Multi-Step Inverse Kinematics: An Efficient
and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity.
We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level.
Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Information bottleneck theory of high-dimensional regression: relevancy,
efficiency and optimality [6.700873164609009]
Overfitting is a central challenge in machine learning, yet many large neural networks readily achieve zero training loss.
We quantify overfitting via residual information, defined as the bits in fitted models that encode noise in training data.
arXiv Detail & Related papers (2022-08-08T00:09:12Z) - A Survey of Learning on Small Data: Generalization, Optimization, and
Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI.
This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data.
Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z) - Parsimonious Inference [0.0]
Parsimonious inference is an information-theoretic formulation of inference over arbitrary architectures.
Our approaches combine efficient encodings with prudent sampling strategies to construct predictive ensembles without cross-validation.
arXiv Detail & Related papers (2021-03-03T04:13:14Z) - Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models.
The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning.
We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Comparative Analysis of Extreme Verification Latency Learning Algorithms [3.3439097577935213]
This paper is a comprehensive survey and comparative analysis of some of the EVL algorithms to point out the weaknesses and strengths of different approaches.
This work is a very first effort to provide a review of some of the existing algorithms in this field to the research community.
arXiv Detail & Related papers (2020-11-26T16:34:56Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.