Related papers: A Simplicity Bubble Problem in Formal-Theoretic Learning Systems

A Simplicity Bubble Problem in Formal-Theoretic Learning Systems

URL: http://arxiv.org/abs/2112.12275v2
Date: Tue, 25 Apr 2023 15:14:07 GMT
Title: A Simplicity Bubble Problem in Formal-Theoretic Learning Systems
Authors: Felipe S. Abrah\~ao, Hector Zenil, Fabio Porto, Michael Winter, Klaus Wehmuth, Itala M. L. D'Ottaviano
Abstract summary: We show that current approaches to machine learning can always be deceived, naturally or artificially, by sufficiently large datasets. We discuss the framework and additional empirical conditions to be met in order to circumvent this deceptive phenomenon.
Score: 1.7996150751268578
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When mining large datasets in order to predict new data, limitations of the principles behind statistical machine learning pose a serious challenge not only to the Big Data deluge, but also to the traditional assumptions that data generating processes are biased toward low algorithmic complexity. Even when one assumes an underlying algorithmic-informational bias toward simplicity in finite dataset generators, we show that current approaches to machine learning (including deep learning, or any formal-theoretic hybrid mix of top-down AI and statistical machine learning approaches), can always be deceived, naturally or artificially, by sufficiently large datasets. In particular, we demonstrate that, for every learning algorithm (with or without access to a formal theory), there is a sufficiently large dataset size above which the algorithmic probability of an unpredictable deceiver is an upper bound (up to a multiplicative constant that only depends on the learning algorithm) for the algorithmic probability of any other larger dataset. In other words, very large and complex datasets can deceive learning algorithms into a ``simplicity bubble'' as likely as any other particular non-deceiving dataset. These deceiving datasets guarantee that any prediction effected by the learning algorithm will unpredictably diverge from the high-algorithmic-complexity globally optimal solution while converging toward the low-algorithmic-complexity locally optimal solution, although the latter is deemed a global one by the learning algorithm. We discuss the framework and additional empirical conditions to be met in order to circumvent this deceptive phenomenon, moving away from statistical machine learning towards a stronger type of machine learning based on, and motivated by, the intrinsic power of algorithmic information theory and computability theory.

Related papers

Collaborative Prediction: To Join or To Disjoin Datasets [5.9697789282446605]
We study the problem of developing practical algorithms that select appropriate dataset to minimize population loss.<n>By leveraging an oracle inequality and data-driven estimators, the algorithm reduces population loss with high probability.
arXiv Detail & Related papers (2025-06-12T20:25:07Z)
Learning-Based TSP-Solvers Tend to Be Overly Greedy [8.79364699260219]
This study constructs a statistical measure called nearest-neighbor density to verify the properties of randomly generated instance of learning-based solvers. We validate that the performance of the learning-based solvers degenerates much on such augmented data. In short, we decipher the limitations of learning-based TSP solvers tending to be overly greedy, which may have profound implications for AI-empowered optimization solvers.
arXiv Detail & Related papers (2025-02-02T12:06:13Z)
Weak-to-Strong Generalization Through the Data-Centric Lens [12.221894353699918]
We propose a simple data-centric mechanism that characterizes weak-to-strong generalization: the overlap density. We present a theoretical result showing that the generalization benefit is a function of the overlap density and a regret bound for our data selection algorithm.
arXiv Detail & Related papers (2024-12-05T05:29:19Z)
Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity. We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level. Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z)
Learning to Bound Counterfactual Inference in Structural Causal Models from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm. The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources. It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z)
Information bottleneck theory of high-dimensional regression: relevancy, efficiency and optimality [6.700873164609009]
Overfitting is a central challenge in machine learning, yet many large neural networks readily achieve zero training loss. We quantify overfitting via residual information, defined as the bits in fitted models that encode noise in training data.
arXiv Detail & Related papers (2022-08-08T00:09:12Z)
A Survey of Learning on Small Data: Generalization, Optimization, and Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI. This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data. Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z)
Parsimonious Inference [0.0]
Parsimonious inference is an information-theoretic formulation of inference over arbitrary architectures. Our approaches combine efficient encodings with prudent sampling strategies to construct predictive ensembles without cross-validation.
arXiv Detail & Related papers (2021-03-03T04:13:14Z)
Towards Optimally Efficient Tree Search with Deep Learning [76.64632985696237]
This paper investigates the classical integer least-squares problem which estimates signals integer from linear models. The problem is NP-hard and often arises in diverse applications such as signal processing, bioinformatics, communications and machine learning. We propose a general hyper-accelerated tree search (HATS) algorithm by employing a deep neural network to estimate the optimal estimation for the underlying simplified memory-bounded A* algorithm.
arXiv Detail & Related papers (2021-01-07T08:00:02Z)
Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization. We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z)
Comparative Analysis of Extreme Verification Latency Learning Algorithms [3.3439097577935213]
This paper is a comprehensive survey and comparative analysis of some of the EVL algorithms to point out the weaknesses and strengths of different approaches. This work is a very first effort to provide a review of some of the existing algorithms in this field to the research community.
arXiv Detail & Related papers (2020-11-26T16:34:56Z)
Optimization for Supervised Machine Learning: Randomized Algorithms for Data and Parameters [10.279748604797911]
Key problems in machine learning and data science are routinely modeled as optimization problems and solved via optimization algorithms. With the increase of the volume of data and the size and complexity of the statistical models used to formulate these often ill-conditioned optimization tasks, there is a need for new efficient algorithms able to cope with these challenges. In this thesis, we deal with each of these sources of difficulty in a different way. To efficiently address the big data issue, we develop new methods which in each iteration examine a small random subset of the training data only. To handle the big model issue, we develop methods which in each iteration update
arXiv Detail & Related papers (2020-08-26T21:15:18Z)
FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data. To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.