Diffusion-Inspired Reconfiguration of Transformers for Uncertainty Calibration
- URL: http://arxiv.org/abs/2602.08920v1
- Date: Mon, 09 Feb 2026 17:24:47 GMT
- Title: Diffusion-Inspired Reconfiguration of Transformers for Uncertainty Calibration
- Authors: Manh Cuong Dao, Quang Hung Pham, Phi Le Nguyen, Thao Nguyen Truong, Bryan Kian Hsiang Low, Trong Nghia Hoang,
- Abstract summary: Uncertainty calibration in pre-trained transformers is critical for their reliable deployment in risk-sensitive applications.<n>We propose a diffusion-inspired reconfiguration of transformers in which each feature transformation block is modeled as a probabilistic mapping.<n>Our method achieves superior calibration and predictive accuracy compared to existing uncertainty-aware transformers.
- Score: 52.017716672255524
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Uncertainty calibration in pre-trained transformers is critical for their reliable deployment in risk-sensitive applications. Yet, most existing pre-trained transformers do not have a principled mechanism for uncertainty propagation through their feature transformation stack. In this work, we propose a diffusion-inspired reconfiguration of transformers in which each feature transformation block is modeled as a probabilistic mapping. Composing these probabilistic mappings reveals a probability path that mimics the structure of a diffusion process, transporting data mass from the input distribution to the pre-trained feature distribution. This probability path can then be recompiled on a diffusion process with a unified transition model to enable principled propagation of representation uncertainty throughout the pre-trained model's architecture while maintaining its original predictive performance. Empirical results across a variety of vision and language benchmarks demonstrate that our method achieves superior calibration and predictive accuracy compared to existing uncertainty-aware transformers.
Related papers
- Spatially-informed transformers: Injecting geostatistical covariance biases into self-attention for spatio-temporal forecasting [0.0]
We propose a hybrid architecture that injects a geostatistic inductive bias directly into the decomposing self-attention mechanism via a learnable costatistics kernel.<n>We demonstrate the phenomenon of Deep Variography'', where the network successfully recovers the true spatial parameters of the underlying process end-to-end via backpropagation.
arXiv Detail & Related papers (2025-12-19T15:32:24Z) - Unsupervised Representation Learning from Sparse Transformation Analysis [79.94858534887801]
We propose to learn representations from sequence data by factorizing the transformations of the latent variables into sparse components.
Input data are first encoded as distributions of latent activations and subsequently transformed using a probability flow model.
arXiv Detail & Related papers (2024-10-07T23:53:25Z) - Convergence Analysis of Flow Matching in Latent Space with Transformers [7.069772598731282]
We present theoretical convergence guarantees for ODE-based generative models, specifically flow matching.
We use a pre-trained autoencoder network to map high-dimensional original inputs to a low-dimensional latent space, where a transformer network is trained to predict the velocity field of the transformation from a standard normal distribution to the target latent distribution.
arXiv Detail & Related papers (2024-04-03T07:50:53Z) - Arbitrary Distributions Mapping via SyMOT-Flow: A Flow-based Approach Integrating Maximum Mean Discrepancy and Optimal Transport [2.7309692684728617]
We introduce a novel model called SyMOT-Flow that trains an invertible transformation by minimizing the symmetric maximum mean discrepancy between samples from two unknown distributions.
The resulting transformation leads to more stable and accurate sample generation.
arXiv Detail & Related papers (2023-08-26T08:39:16Z) - Quantification of Predictive Uncertainty via Inference-Time Sampling [57.749601811982096]
We propose a post-hoc sampling strategy for estimating predictive uncertainty accounting for data ambiguity.
The method can generate different plausible outputs for a given input and does not assume parametric forms of predictive distributions.
arXiv Detail & Related papers (2023-08-03T12:43:21Z) - Latent Positional Information is in the Self-Attention Variance of
Transformer Language Models Without Positional Embeddings [68.61185138897312]
We show that a frozen transformer language model encodes strong positional information through the shrinkage of self-attention variance.
Our findings serve to justify the decision to discard positional embeddings and thus facilitate more efficient pretraining of transformer language models.
arXiv Detail & Related papers (2023-05-23T01:03:40Z) - Calibration of Natural Language Understanding Models with Venn--ABERS
Predictors [0.0]
Transformers are prone to generate uncalibrated predictions or extreme probabilities.
We build several inductive Venn--ABERS predictors (IVAP) based on a selection of pre-trained transformers.
arXiv Detail & Related papers (2022-05-21T13:09:01Z) - XAI for Transformers: Better Explanations through Conservative
Propagation [60.67748036747221]
We show that the gradient in a Transformer reflects the function only locally, and thus fails to reliably identify the contribution of input features to the prediction.
Our proposal can be seen as a proper extension of the well-established LRP method to Transformers.
arXiv Detail & Related papers (2022-02-15T10:47:11Z) - Certifying Model Accuracy under Distribution Shifts [151.67113334248464]
We present provable robustness guarantees on the accuracy of a model under bounded Wasserstein shifts of the data distribution.
We show that a simple procedure that randomizes the input of the model within a transformation space is provably robust to distributional shifts under the transformation.
arXiv Detail & Related papers (2022-01-28T22:03:50Z) - Which Invariance Should We Transfer? A Causal Minimax Learning Approach [18.71316951734806]
We present a comprehensive minimax analysis from a causal perspective.
We propose an efficient algorithm to search for the subset with minimal worst-case risk.
The effectiveness and efficiency of our methods are demonstrated on synthetic data and the diagnosis of Alzheimer's disease.
arXiv Detail & Related papers (2021-07-05T09:07:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.