Related papers: Transformers As Approximations of Solomonoff Induction

Transformers As Approximations of Solomonoff Induction

URL: http://arxiv.org/abs/2408.12065v1
Date: Thu, 22 Aug 2024 02:05:44 GMT
Title: Transformers As Approximations of Solomonoff Induction
Authors: Nathan Young, Michael Witbrock,
Abstract summary: Solomonoff Induction is an optimal-in-the-limit algorithm for sequence prediction. Being an optimal form of computational sequence prediction, it seems plausible that it may be used as a model against which other methods of sequence prediction might be compared. We put forth and explore the hypothesis that Transformer models approximate Solomonoff Induction better than any other extant sequence prediction method.
Score: 7.890110890837779
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Solomonoff Induction is an optimal-in-the-limit unbounded algorithm for sequence prediction, representing a Bayesian mixture of every computable probability distribution and performing close to optimally in predicting any computable sequence. Being an optimal form of computational sequence prediction, it seems plausible that it may be used as a model against which other methods of sequence prediction might be compared. We put forth and explore the hypothesis that Transformer models - the basis of Large Language Models - approximate Solomonoff Induction better than any other extant sequence prediction method. We explore evidence for and against this hypothesis, give alternate hypotheses that take this evidence into account, and outline next steps for modelling Transformers and other kinds of AI in this way.

Related papers

Next-Token Prediction Should be Ambiguity-Sensitive: A Meta-Learning Perspective [12.655285605773932]
We show that Transformers indeed struggle with high-ambiguity predictions across model sizes.<n>Preliminary results show substantial gains in ambiguous contexts through improved capacity allocation and test-time scalable inference.
arXiv Detail & Related papers (2025-06-19T13:05:12Z)
Conformal Generative Modeling with Improved Sample Efficiency through Sequential Greedy Filtering [55.15192437680943]
Generative models lack rigorous statistical guarantees for their outputs. We propose a sequential conformal prediction method producing prediction sets that satisfy a rigorous statistical guarantee. This guarantee states that with high probability, the prediction sets contain at least one admissible (or valid) example.
arXiv Detail & Related papers (2024-10-02T15:26:52Z)
Prompting a Pretrained Transformer Can Be a Universal Approximator [105.59562522323274]
We show that much smaller pretrained models than previously thought can be universal approximators when prefixed. We also offer Jackson-type bounds on the length of the prefix needed to approximate a function to a desired precision.
arXiv Detail & Related papers (2024-02-22T18:12:48Z)
Transformers can optimally learn regression mixture models [22.85684729248361]
We show that transformers can learn an optimal predictor for mixtures of regressions. Experiments also demonstrate that transformers can learn mixtures of regressions in a sample-efficient fashion. We prove constructively that the decision-theoretic optimal procedure is indeed implementable by a transformer.
arXiv Detail & Related papers (2023-11-14T18:09:15Z)
Structured Radial Basis Function Network: Modelling Diversity for Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions. A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems. It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z)
Conformal Nucleus Sampling [67.5232384936661]
We assess whether a top-$p$ set is indeed aligned with its probabilistic meaning in various linguistic contexts. We find that OPT models are overconfident, and that calibration shows a moderate inverse scaling with model size.
arXiv Detail & Related papers (2023-05-04T08:11:57Z)
Bayesian Sparse Regression for Mixed Multi-Responses with Application to Runtime Metrics Prediction in Fog Manufacturing [6.288767115532775]
Fog manufacturing can greatly enhance traditional manufacturing systems through distributed computation Fog units. It is known that the predictive offloading methods highly depend on accurate prediction and uncertainty quantification of runtime performance metrics. We propose a Bayesian sparse regression for multivariate mixed responses to enhance the prediction of runtime performance metrics.
arXiv Detail & Related papers (2022-10-10T16:14:08Z)
Predictive Inference with Feature Conformal Prediction [80.77443423828315]
We propose feature conformal prediction, which extends the scope of conformal prediction to semantic feature spaces. From a theoretical perspective, we demonstrate that feature conformal prediction provably outperforms regular conformal prediction under mild assumptions. Our approach could be combined with not only vanilla conformal prediction, but also other adaptive conformal prediction methods.
arXiv Detail & Related papers (2022-10-01T02:57:37Z)
Correcting Model Bias with Sparse Implicit Processes [0.9187159782788579]
We show that Sparse Implicit Processes (SIP) is capable of correcting model bias when the data generating mechanism differs strongly from the one implied by the model. We use synthetic datasets to show that SIP is capable of providing predictive distributions that reflect the data better than the exact predictions of the initial, but wrongly assumed model.
arXiv Detail & Related papers (2022-07-21T18:00:01Z)
Calibration of Natural Language Understanding Models with Venn--ABERS Predictors [0.0]
Transformers are prone to generate uncalibrated predictions or extreme probabilities. We build several inductive Venn--ABERS predictors (IVAP) based on a selection of pre-trained transformers.
arXiv Detail & Related papers (2022-05-21T13:09:01Z)
Rationales for Sequential Predictions [117.93025782838123]
Sequence models are a critical component of modern NLP systems, but their predictions are difficult to explain. We consider model explanations though rationales, subsets of context that can explain individual model predictions. We propose an efficient greedy algorithm to approximate this objective.
arXiv Detail & Related papers (2021-09-14T01:25:15Z)
Mean-Field Approximation to Gaussian-Softmax Integral with Application to Uncertainty Estimation [23.38076756988258]
We propose a new single-model based approach to quantify uncertainty in deep neural networks. We use a mean-field approximation formula to compute an analytically intractable integral. Empirically, the proposed approach performs competitively when compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-06-13T07:32:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.