The Monte Carlo Transformer: a stochastic self-attention model for
sequence prediction
- URL: http://arxiv.org/abs/2007.08620v2
- Date: Tue, 15 Dec 2020 14:27:22 GMT
- Title: The Monte Carlo Transformer: a stochastic self-attention model for
sequence prediction
- Authors: Alice Martin (CMAP, IP Paris, CITI, TIPIC-SAMOVAR), Charles Ollion
(CMAP), Florian Strub, Sylvain Le Corff (IP Paris, CITI, TIPIC-SAMOVAR),
Olivier Pietquin
- Abstract summary: The keys, queries, values and attention vectors of the network are considered as the unobserved states of its hidden structure.
We use Sequential Monte Carlo methods to approximate the posterior distributions of the states given observations, and to estimate the gradient of the log-likelihood.
- Score: 19.815744837363546
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces the Sequential Monte Carlo Transformer, an original
approach that naturally captures the observations distribution in a transformer
architecture. The keys, queries, values and attention vectors of the network
are considered as the unobserved stochastic states of its hidden structure.
This generative model is such that at each time step the received observation
is a random function of its past states in a given attention window. In this
general state-space setting, we use Sequential Monte Carlo methods to
approximate the posterior distributions of the states given the observations,
and to estimate the gradient of the log-likelihood. We hence propose a
generative model giving a predictive distribution, instead of a single-point
estimate.
Related papers
- A Monte Carlo Framework for Calibrated Uncertainty Estimation in Sequence Prediction [19.710390261102113]
We propose a Monte Carlo framework to estimate probabilities and confidence intervals associated with the distribution of a discrete sequence.
Our framework uses a Monte Carlo simulator, implemented as an autoregressively trained neural network, to sample sequences conditioned on an image input.
Experiments on synthetic and real data show that the framework produces accurate discriminative predictions, but can suffer from miscalibration.
arXiv Detail & Related papers (2024-10-30T17:53:37Z) - von Mises Quasi-Processes for Bayesian Circular Regression [57.88921637944379]
We explore a family of expressive and interpretable distributions over circle-valued random functions.
The resulting probability model has connections with continuous spin models in statistical physics.
For posterior inference, we introduce a new Stratonovich-like augmentation that lends itself to fast Markov Chain Monte Carlo sampling.
arXiv Detail & Related papers (2024-06-19T01:57:21Z) - Fusion of Gaussian Processes Predictions with Monte Carlo Sampling [61.31380086717422]
In science and engineering, we often work with models designed for accurate prediction of variables of interest.
Recognizing that these models are approximations of reality, it becomes desirable to apply multiple models to the same data and integrate their outcomes.
arXiv Detail & Related papers (2024-03-03T04:21:21Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Approximate sampling and estimation of partition functions using neural
networks [0.0]
We show how variational autoencoders (VAEs) can be applied to this task.
We invert the logic and train the VAE to fit a simple and tractable distribution, on the assumption of a complex and intractable latent distribution, specified up to normalization.
This procedure constructs approximations without the use of training data or Markov chain Monte Carlo sampling.
arXiv Detail & Related papers (2022-09-21T15:16:45Z) - Markov Observation Models [0.0]
The Hidden Markov Model is expanded to allow for Markov chain observations.
The observations are assumed to be a Markov chain whose one step transition probabilities depend upon the hidden Markov chain.
An Expectation-Maximization algorithm is developed to estimate the transition probabilities for both the hidden state and for the observations.
arXiv Detail & Related papers (2022-08-12T16:53:07Z) - B\'ezier Curve Gaussian Processes [8.11969931278838]
This paper proposes a new probabilistic sequence model building on probabilistic B'ezier curves.
Combined with a Mixture Density network, Bayesian conditional inference can be performed without the need for mean field variational approximation.
The model is used for pedestrian trajectory prediction, where a generated prediction also serves as a GP prior.
arXiv Detail & Related papers (2022-05-03T19:49:57Z) - Distributional Gradient Boosting Machines [77.34726150561087]
Our framework is based on XGBoost and LightGBM.
We show that our framework achieves state-of-the-art forecast accuracy.
arXiv Detail & Related papers (2022-04-02T06:32:19Z) - Modeling Sequences as Distributions with Uncertainty for Sequential
Recommendation [63.77513071533095]
Most existing sequential methods assume users are deterministic.
Item-item transitions might fluctuate significantly in several item aspects and exhibit randomness of user interests.
We propose a Distribution-based Transformer Sequential Recommendation (DT4SR) which injects uncertainties into sequential modeling.
arXiv Detail & Related papers (2021-06-11T04:35:21Z) - Targeted stochastic gradient Markov chain Monte Carlo for hidden Markov models with rare latent states [48.705095800341944]
Markov chain Monte Carlo (MCMC) algorithms for hidden Markov models often rely on the forward-backward sampler.
This makes them computationally slow as the length of the time series increases, motivating the development of sub-sampling-based approaches.
We propose a targeted sub-sampling approach that over-samples observations corresponding to rare latent states when calculating the gradient of parameters associated with them.
arXiv Detail & Related papers (2018-10-31T17:44:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.