Related papers: Probabilistic Learning and Generation in Deep Sequence Models

Probabilistic Learning and Generation in Deep Sequence Models

URL: http://arxiv.org/abs/2603.00888v1
Date: Sun, 01 Mar 2026 03:22:52 GMT
Title: Probabilistic Learning and Generation in Deep Sequence Models
Authors: Wenlong Chen,
Abstract summary: Probability models quantify the uncertainty associated with unobserved variables with rules of probability.<n>Two major bottlenecks of Bayesian methods, especially when applied in deep neural networks, are prior specification and approximation quality.<n>This thesis leverages inductive biases in DSMs to design probabilistic inference or structure, which bridges the gap between DSMs and probabilistic models.
Score: 6.057946859550886
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Despite exceptional predictive performance of Deep sequence models (DSMs), the main concern of their deployment centers around the lack of uncertainty awareness. In contrast, probabilistic models quantify the uncertainty associated with unobserved variables with rules of probability. Notably, Bayesian methods leverage Bayes' rule to express our belief of unobserved variables in a principled way. Since exact Bayesian inference is computationally infeasible at scale, approximate inference is required in practice. Two major bottlenecks of Bayesian methods, especially when applied in deep neural networks, are prior specification and approximation quality. In Chapter 3 & 4, we investigate how the architectures of DSMs themselves can be informative for the design of priors or approximations in probabilistic models. We first develop an approximate Bayesian inference method tailored to the Transformer based on the similarity between attention and sparse Gaussian process. Next, we exploit the long-range memory preservation capability of HiPPOs (High-order Polynomial Projection Operators) to construct an interdomain inducing point for Gaussian process, which successfully memorizes the history in online learning. In addition to the progress of DSMs in predictive tasks, sequential generative models consisting of a sequence of latent variables are popularized in the domain of deep generative models. Inspired by the explicit self-supervised signals for these latent variables in diffusion models, in Chapter 5, we explore the possibility of improving other generative models with self-supervision for their sequential latent states, and investigate desired probabilistic structures over them. Overall, this thesis leverages inductive biases in DSMs to design probabilistic inference or structure, which bridges the gap between DSMs and probabilistic models, leading to mutually reinforced improvement.

Related papers

A Statistical Assessment of Amortized Inference Under Signal-to-Noise Variation and Distribution Shift [0.9590253747787195]
Recent success of deep neural networks and foundation models has given rise to a new paradigm in statistical modeling.<n>In amortized inference, substantial computation is invested upfront to train a neural network that can produce approximate posterior or predictions.<n>Despite the growing popularity of amortized inference, its statistical interpretation and its role within Bayesian inference remain poorly understood.
arXiv Detail & Related papers (2026-01-12T19:21:51Z)
Elucidated Rolling Diffusion Models for Probabilistic Weather Forecasting [52.6508222408558]
We introduce Elucidated Rolling Diffusion Models (ERDM)<n>ERDM is the first framework to unify a rolling forecast structure with the principled, performant design of Elucidated Diffusion Models (EDM)<n>On 2D Navier-Stokes simulations and ERA5 global weather forecasting at 1.5circ resolution, ERDM consistently outperforms key diffusion-based baselines.
arXiv Detail & Related papers (2025-06-24T21:44:31Z)
Preconditioned Inexact Stochastic ADMM for Deep Model [35.37705488695026]
This paper develops an algorithm, PISA, which enables scalable parallel computing and supports various preconditions.<n>It converges under the sole assumption of Lipschitz continuity of the gradient on a bounded region, removing the need for other conditions commonly imposed by methods.<n>It demonstrates its superior numerical performance compared to various state-of-the-art iterations.
arXiv Detail & Related papers (2025-02-15T12:28:51Z)
Exchangeable Sequence Models Quantify Uncertainty Over Latent Concepts [6.256239986541708]
We show that pre-trained sequence models are naturally capable of probabilistic reasoning over exchangeable data points.<n>A sequence model learns the relationship between observations, which differs from typical Bayesian models.<n>We show the sequence prediction loss controls the quality of uncertainty quantification.
arXiv Detail & Related papers (2024-08-06T17:16:10Z)
Inflationary Flows: Calibrated Bayesian Inference with Diffusion-Based Models [0.0]
We show how diffusion-based models can be repurposed for performing principled, identifiable Bayesian inference.<n>We show how such maps can be learned via standard DBM training using a novel noise schedule.<n>The result is a class of highly expressive generative models, uniquely defined on a low-dimensional latent space.
arXiv Detail & Related papers (2024-07-11T19:58:19Z)
On the Efficient Marginalization of Probabilistic Sequence Models [3.5897534810405403]
This dissertation focuses on using autoregressive models to answer complex probabilistic queries. We develop a class of novel and efficient approximation techniques for marginalization in sequential models that are model-agnostic.
arXiv Detail & Related papers (2024-03-06T19:29:08Z)
Human Trajectory Forecasting with Explainable Behavioral Uncertainty [63.62824628085961]
Human trajectory forecasting helps to understand and predict human behaviors, enabling applications from social robots to self-driving cars. Model-free methods offer superior prediction accuracy but lack explainability, while model-based methods provide explainability but cannot predict well. We show that BNSP-SFM achieves up to a 50% improvement in prediction accuracy, compared with 11 state-of-the-art methods.
arXiv Detail & Related papers (2023-07-04T16:45:21Z)
When in Doubt: Neural Non-Parametric Uncertainty Quantification for Epidemic Forecasting [70.54920804222031]
Most existing forecasting models disregard uncertainty quantification, resulting in mis-calibrated predictions. Recent works in deep neural models for uncertainty-aware time-series forecasting also have several limitations. We model the forecasting task as a probabilistic generative process and propose a functional neural process model called EPIFNP.
arXiv Detail & Related papers (2021-06-07T18:31:47Z)
Improving Uncertainty Calibration via Prior Augmented Data [56.88185136509654]
Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators. They are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions. We propose a solution by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels.
arXiv Detail & Related papers (2021-02-22T07:02:37Z)
Learning Interpretable Deep State Space Model for Probabilistic Time Series Forecasting [98.57851612518758]
Probabilistic time series forecasting involves estimating the distribution of future based on its history. We propose a deep state space model for probabilistic time series forecasting whereby the non-linear emission model and transition model are parameterized by networks. We show in experiments that our model produces accurate and sharp probabilistic forecasts.
arXiv Detail & Related papers (2021-01-31T06:49:33Z)
Unlabelled Data Improves Bayesian Uncertainty Calibration under Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation. We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.