Autoregressivity in the Latent Space of a GP-VAE Language Model: An Empirical Ablation Study
- URL: http://arxiv.org/abs/2512.24102v1
- Date: Tue, 30 Dec 2025 09:23:09 GMT
- Title: Autoregressivity in the Latent Space of a GP-VAE Language Model: An Empirical Ablation Study
- Authors: Yves Ruffenach,
- Abstract summary: Language models typically rely on an autoregressive factorization over tokens.<n>We conduct a systematic ablation study of the role played by latent autoregression.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper provides an ablation-based analysis of latent autoregression in GP-VAE models, building upon our previous work introducing the architecture. Language models typically rely on an autoregressive factorization over tokens. In contrast, our prior work proposed shifting sequential structure to the latent space through a causal Gaussian process, while using a non-autoregressive decoder. Here, we conduct a systematic ablation study of the role played by latent autoregression. We compare (i) a full GP-VAE model with autoregressive latent dynamics, (ii) a non-autoregressive ablation in which latent variables are independent, and (iii) a standard token-level autoregressive Transformer. Our results show that, within the considered regime (medium-scale corpora and short training contexts), latent autoregression induces latent trajectories that are significantly more compatible with the Gaussian-process prior and exhibit greater long-horizon stability. In contrast, removing autoregression leads to degraded latent structure and unstable long-range behavior. These findings highlight the role of latent autoregression as an effective mechanism for organizing long-range structure, while remaining complementary to token-level autoregressive modeling. They should be interpreted as an empirical analysis of representational structure rather than as a proposal for a new architecture.
Related papers
- StepVAR: Structure-Texture Guided Pruning for Visual Autoregressive Models [98.72926158261937]
We propose a training-free token pruning framework for Visual AutoRegressive models.<n>We employ a lightweight high-pass filter to capture local texture details, while leveraging Principal Component Analysis (PCA) to preserve global structural information.<n>To maintain valid next-scale prediction under sparse tokens, we introduce a nearest neighbor feature propagation strategy.
arXiv Detail & Related papers (2026-03-02T11:35:05Z) - Why Self-Rewarding Works: Theoretical Guarantees for Iterative Alignment of Language Models [50.248686344277246]
Self-Rewarding Language Models (SRLMs) achieve notable success in iteratively improving alignment without external feedback.<n>This paper provides the first rigorous theoretical guarantees for SRLMs.
arXiv Detail & Related papers (2026-01-30T03:45:43Z) - Learning Causality for Longitudinal Data [1.2691047660244335]
This thesis develops methods for causal inference and causal representation learning in high-dimensional, time-varying data.<n>The first contribution introduces the Causal Dynamic Variational Autoencoder (CDVAE), a model for estimating Individual Treatment Effects (ITEs)<n>The second contribution proposes an efficient framework for long-term counterfactual regression based on RNNs enhanced with Contrastive Predictive Coding ( CPC) and InfoMax.<n>The third contribution advances CRL by addressing how latent causes manifest in observed variables.
arXiv Detail & Related papers (2025-12-04T16:51:49Z) - Fast Autoregressive Models for Continuous Latent Generation [49.079819389916764]
Autoregressive models have demonstrated remarkable success in sequential data generation, particularly in NLP.<n>Recent work, the masked autoregressive model (MAR) bypasses quantization by modeling per-token distributions in continuous spaces using a diffusion head.<n>We propose Fast AutoRegressive model (FAR), a novel framework that replaces MAR's diffusion head with a lightweight shortcut head.
arXiv Detail & Related papers (2025-04-24T13:57:08Z) - Model Hemorrhage and the Robustness Limits of Large Language Models [119.46442117681147]
Large language models (LLMs) demonstrate strong performance across natural language processing tasks, yet undergo significant performance degradation when modified for deployment.<n>We define this phenomenon as model hemorrhage - performance decline caused by parameter alterations and architectural changes.
arXiv Detail & Related papers (2025-03-31T10:16:03Z) - State-space models can learn in-context by gradient descent [1.3087858009942543]
We show that state-space models can perform gradient-based learning and use it for in-context learning in much the same way as transformers.<n>Specifically, we prove that a single structured state-space model layer, augmented with multiplicative input and output gating, can reproduce the outputs of an implicit linear model.<n>We also provide novel insights into the relationship between state-space models and linear self-attention, and their ability to learn in-context.
arXiv Detail & Related papers (2024-10-15T15:22:38Z) - Understanding Augmentation-based Self-Supervised Representation Learning
via RKHS Approximation and Regression [53.15502562048627]
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator.
This work delves into a statistical analysis of augmentation-based pretraining.
arXiv Detail & Related papers (2023-06-01T15:18:55Z) - Probabilistic Traffic Forecasting with Dynamic Regression [15.31488551912888]
This paper proposes a dynamic regression (DR) framework that enhances existing deeptemporal models by incorporating for learning the error process in traffic forecasting.<n>The framework relaxes the assumption of time independence by modeling the error series of the base model using a matrix- structured autoregressive (AR) model.<n>The newly designed loss function is based on the likelihood of a non-isotropic error term, enabling the model to generate probabilistic forecasts while preserving the original outputs of the base model.
arXiv Detail & Related papers (2023-01-17T01:12:44Z) - Autoregressive Dynamics Models for Offline Policy Evaluation and
Optimization [60.73540999409032]
We show that expressive autoregressive dynamics models generate different dimensions of the next state and reward sequentially conditioned on previous dimensions.
We also show that autoregressive dynamics models are useful for offline policy optimization by serving as a way to enrich the replay buffer.
arXiv Detail & Related papers (2021-04-28T16:48:44Z) - Stochastically forced ensemble dynamic mode decomposition for
forecasting and analysis of near-periodic systems [65.44033635330604]
We introduce a novel load forecasting method in which observed dynamics are modeled as a forced linear system.
We show that its use of intrinsic linear dynamics offers a number of desirable properties in terms of interpretability and parsimony.
Results are presented for a test case using load data from an electrical grid.
arXiv Detail & Related papers (2020-10-08T20:25:52Z) - Self-Reflective Variational Autoencoder [21.054722609128525]
Variational Autoencoder (VAE) is a powerful framework for learning latent variable generative models.
We introduce a solution, which we call self-reflective inference.
We empirically demonstrate the clear advantages of matching the variational posterior to the exact posterior.
arXiv Detail & Related papers (2020-07-10T05:05:26Z) - Variational Auto-Regressive Gaussian Processes for Continual Learning [17.43751039943161]
We develop a principled posterior updating mechanism to solve sequential tasks in continual learning.
By relying on sparse inducing point approximations for scalable posteriors, we propose a novel auto-regressive variational distribution.
Mean predictive entropy estimates show VAR-GPs prevent catastrophic forgetting.
arXiv Detail & Related papers (2020-06-09T19:23:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.