Related papers: Markov Chain Estimation with In-Context Learning

Markov Chain Estimation with In-Context Learning

URL: http://arxiv.org/abs/2508.03934v1
Date: Tue, 05 Aug 2025 21:55:17 GMT
Title: Markov Chain Estimation with In-Context Learning
Authors: Simon Lepage, Jeremie Mary, David Picard,
Abstract summary: We set up Markov chains with random transition matrices and we train transformers to predict the next token.<n>We show that there is a threshold in transformer size and in training set size above which the model is able to learn to estimate the transition probabilities from its context.
Score: 10.757287948514604
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We investigate the capacity of transformers to learn algorithms involving their context while solely being trained using next token prediction. We set up Markov chains with random transition matrices and we train transformers to predict the next token. Matrices used during training and test are different and we show that there is a threshold in transformer size and in training set size above which the model is able to learn to estimate the transition probabilities from its context instead of memorizing the training patterns. Additionally, we show that more involved encoding of the states enables more robust prediction for Markov chains with structures different than those seen during training.

Related papers

Algorithmic Capabilities of Random Transformers [49.73113518329544]
We investigate what functions can be learned by randomly transformers in which only the embedding layers are optimized. We find that these random transformers can perform a wide range of meaningful algorithmic tasks. Our results indicate that some algorithmic capabilities are present in transformers even before these models are trained.
arXiv Detail & Related papers (2024-10-06T06:04:23Z)
Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context [25.360386832940875]
We show that when linear transformers are pre-trained on random instances for linear regression tasks, they make predictions using an algorithm similar to that of ordinary least squares.<n>In some settings, these trained transformers can exhibit "benign overfitting in-context"
arXiv Detail & Related papers (2024-10-02T17:30:21Z)
The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains [28.41876902994335]
We introduce a simple Markov Chain sequence modeling task to study how this in-context learning (ICL) capability emerges. Transformers trained on this task form emphstatistical induction heads which compute accurate next-token probabilities. We show how successful learning results from the interaction between the transformer's layers, and uncover evidence that the presence of the simpler unigram solution may delay formation of the final bigram solution.
arXiv Detail & Related papers (2024-02-16T18:28:36Z)
In-Context Convergence of Transformers [63.04956160537308]
We study the learning dynamics of a one-layer transformer with softmax attention trained via gradient descent. For data with imbalanced features, we show that the learning dynamics take a stage-wise convergence process.
arXiv Detail & Related papers (2023-10-08T17:55:33Z)
End-to-End Training of a Neural HMM with Label and Transition Probabilities [36.32865468394113]
We investigate a novel modeling approach for end-to-end neural network training using hidden Markov models (HMM) In our approach there are explicit, learnable probabilities for transitions between segments as opposed to a blank label that implicitly encodes duration statistics. We find that while the transition model training does not improve recognition performance, it has a positive impact on the alignment quality.
arXiv Detail & Related papers (2023-10-04T10:56:00Z)
Latent State Models of Training Dynamics [51.88132043461152]
We train models with different random seeds and compute a variety of metrics throughout training. We then fit a hidden Markov model (HMM) over the resulting sequences of metrics. We use the HMM representation to study phase transitions and identify latent "detour" states that slow down convergence.
arXiv Detail & Related papers (2023-08-18T13:20:08Z)
Supervised Pretraining Can Learn In-Context Reinforcement Learning [96.62869749926415]
In this paper, we study the in-context learning capabilities of transformers in decision-making problems. We introduce and study Decision-Pretrained Transformer (DPT), a supervised pretraining method where the transformer predicts an optimal action. We find that the pretrained transformer can be used to solve a range of RL problems in-context, exhibiting both exploration online and conservatism offline.
arXiv Detail & Related papers (2023-06-26T17:58:50Z)
Trained Transformers Learn Linear Models In-Context [39.56636898650966]
Attention-based neural networks as transformers have demonstrated a remarkable ability to exhibit inattention learning (ICL) We show that when transformer training over random instances of linear regression problems, these models' predictions mimic nonlinear of ordinary squares.
arXiv Detail & Related papers (2023-06-16T15:50:03Z)
What learning algorithm is in-context learning? Investigations with linear models [87.91612418166464]
We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly. We show that trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression. Preliminary evidence that in-context learners share algorithmic features with these predictors.
arXiv Detail & Related papers (2022-11-28T18:59:51Z)
Topographic VAEs learn Equivariant Capsules [84.33745072274942]
We introduce the Topographic VAE: a novel method for efficiently training deep generative models with topographically organized latent variables. We show that such a model indeed learns to organize its activations according to salient characteristics such as digit class, width, and style on MNIST. We demonstrate approximate equivariance to complex transformations, expanding upon the capabilities of existing group equivariant neural networks.
arXiv Detail & Related papers (2021-09-03T09:25:57Z)
Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency. We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.