Related papers: Support Tokens, Stability Margins, and a New Foundation for Robust LLMs

Support Tokens, Stability Margins, and a New Foundation for Robust LLMs

URL: http://arxiv.org/abs/2602.22271v2
Date: Sun, 01 Mar 2026 22:13:09 GMT
Title: Support Tokens, Stability Margins, and a New Foundation for Robust LLMs
Authors: Deepak Agarwal, Dhyey Dharmendrakumar Mavani, Suyash Gupta, Karthik Sethuraman, Tejas Dharamsi,
Abstract summary: We re-interpret causal self-attention transformers, the backbone of modern foundation models.<n>A barrier constraint emerges on the self-attention parameters.<n>This reveals a boundary where attention becomes ill-conditioned.
Score: 1.429795922604976
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Self-attention is usually described as a flexible, content-adaptive way to mix a token with information from its past. We re-interpret causal self-attention transformers, the backbone of modern foundation models, within a probabilistic framework, much like how classical PCA is extended to probabilistic PCA. However, this re-formulation reveals a surprising and deeper structural insight: due to a change-of-variables phenomenon, a barrier constraint emerges on the self-attention parameters. This induces a highly structured geometry on the token space, providing theoretical insights into the dynamics of LLM decoding. This reveals a boundary where attention becomes ill-conditioned, leading to a margin interpretation similar to classical support vector machines. Analogous to support vectors, this naturally gives rise to the concept of support tokens. Furthermore, we show that LLMs define a consistent stochastic process over (infinite) token sequences, providing a rigorous probabilistic framework for sequence modeling. We propose a Bayesian framework and derive a MAP estimation objective that requires only a minimal modification to standard LLM training: the addition of a smooth log-barrier penalty to the usual cross-entropy loss. We demonstrate that this provides more robust models without sacrificing out-of-sample accuracy and that it is straightforward to incorporate in practice.

Related papers

Sculpting Latent Spaces With MMD: Disentanglement With Programmable Priors [30.182736043604304]
We introduce the Programmable Prior Framework, a method built on the Maximum Mean Discrepancy (MMD)<n>Our work provides a foundational tool for representation engineering, opening new avenues for model identifiability and causal reasoning.
arXiv Detail & Related papers (2025-10-13T21:26:01Z)
Probabilistic Token Alignment for Large Language Model Fusion [100.30692772017238]
Training large language models (LLMs) from scratch can yield models with unique functionalities and strengths, but it is costly and often leads to redundant capabilities.<n>A key challenge in existing model fusion is their dependence on manually predefined vocabulary alignment.<n>We propose the probabilistic token alignment method as a general and soft mapping for alignment, named as PTA-LLM.
arXiv Detail & Related papers (2025-09-21T23:18:24Z)
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation [85.82112629564942]
We propose TokenBridge, which maintains the strong representation capacity of continuous tokens while preserving the modeling simplicity of discrete tokens.<n>We introduce a dimension-wise quantization strategy that independently discretizes each feature dimension, paired with a lightweight autoregressive prediction mechanism.<n>Our approach achieves reconstruction and generation quality on par with continuous methods while using standard categorical prediction.
arXiv Detail & Related papers (2025-03-20T17:59:59Z)
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [76.15163242945813]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence.<n>We introduce a novel generative model that generates tokens on the basis of human-interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z)
Analyzing Finetuning Representation Shift for Multimodal LLMs Steering [56.710375516257876]
We propose to map hidden states to interpretable visual and textual concepts.<n>This enables us to more efficiently compare certain semantic dynamics, such as the shift from an original and fine-tuned model.<n>We also demonstrate the use of shift vectors to capture these concepts changes.
arXiv Detail & Related papers (2025-01-06T13:37:13Z)
Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode. We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z)
From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers [41.82477691012942]
We study learning a 1-layer self-attention model from a set of prompts and associated output data. We first establish a precise mapping between the self-attention mechanism and Markov models. We characterize an intriguing winner-takes-all phenomenon where the generative process implemented by self-attention collapses into sampling a limited subset of tokens.
arXiv Detail & Related papers (2024-02-21T03:51:34Z)
Coverage-Validity-Aware Algorithmic Recourse [21.642948522310782]
We propose a novel framework to generate a model-agnostic recourse that exhibits robustness to model shifts.<n>Our framework first builds a coverage-validity-aware linear surrogate of the nonlinear (black-box) model.<n>We show that our surrogate pushes the approximate hyperplane intuitively, facilitating not only robust but also interpretable recourses.
arXiv Detail & Related papers (2023-11-19T15:21:49Z)
Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space. We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z)
Self-Reflective Variational Autoencoder [21.054722609128525]
Variational Autoencoder (VAE) is a powerful framework for learning latent variable generative models. We introduce a solution, which we call self-reflective inference. We empirically demonstrate the clear advantages of matching the variational posterior to the exact posterior.
arXiv Detail & Related papers (2020-07-10T05:05:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.