Token Maturation: Autoregressive Language Generation via Continuous Token Dynamics
- URL: http://arxiv.org/abs/2601.04854v1
- Date: Thu, 08 Jan 2026 11:44:34 GMT
- Title: Token Maturation: Autoregressive Language Generation via Continuous Token Dynamics
- Authors: Oshri Naparstek,
- Abstract summary: We introduce a continuous autoregressive formulation of language generation in which tokens are represented as continuous vectors that emphmature over multiple update steps before being discretized.<n>We show that this maturation process alone is sufficient to produce coherent and diverse text using deterministic decoding (argmax)<n>Additional perturbations, such as dynamics or history smoothing, can be incorporated naturally but are not required for the model to function.
- Score: 0.7252027234425333
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autoregressive language models are conventionally defined over discrete token sequences, committing to a specific token at every generation step. This early discretization forces uncertainty to be resolved through token-level sampling, often leading to instability, repetition, and sensitivity to decoding heuristics. In this work, we introduce a continuous autoregressive formulation of language generation in which tokens are represented as continuous vectors that \emph{mature} over multiple update steps before being discretized. Rather than sampling tokens, the model evolves continuous token representations through a deterministic dynamical process, committing to a discrete token only when the representation has sufficiently converged. Discrete text is recovered via hard decoding, while uncertainty is maintained and resolved in the continuous space. We show that this maturation process alone is sufficient to produce coherent and diverse text using deterministic decoding (argmax), without reliance on token-level sampling, diffusion-style denoising, or auxiliary stabilization mechanisms. Additional perturbations, such as stochastic dynamics or history smoothing, can be incorporated naturally but are not required for the model to function. To our knowledge, this is the first autoregressive language model that generates text by evolving continuous token representations to convergence prior to discretization, enabling stable generation without token-level sampling.
Related papers
- Just on Time: Token-Level Early Stopping for Diffusion Language Models [0.0]
Diffusion language models generate text through iterative refinement, a process that is often computationally inefficient.<n>We introduce a training-free, token-level early stopping approach that identifies convergence independently at each position.<n>This yields adaptive per-token freezing without task-specific fine-tuning, substantially reducing the total number of diffusion steps required.
arXiv Detail & Related papers (2026-02-11T18:44:04Z) - Kelix Technical Report [86.64551727600104]
We present Kelix, a fully discrete autoregressive unified model that closes the understanding gap between discrete and continuous visual representations.<n>Recent work has explored discrete visual tokenization to enable fully autoregressive multimodal modeling.
arXiv Detail & Related papers (2026-02-10T14:48:26Z) - Zonkey: A Hierarchical Diffusion Language Model with Differentiable Tokenization and Probabilistic Attention [0.0]
Zonkey is a hierarchical diffusion model that addresses limitations through a fully trainable pipeline from raw characters to document-level representations.<n>At its core is a differentiable tokenizer that learns probabilistic beginning-of-sequence (BOS) decisions.<n>Zonkey generates coherent, variable-length text from noise, demonstrating emergent hierarchies.
arXiv Detail & Related papers (2026-01-29T14:17:37Z) - Continuous Autoregressive Language Models [56.49239051750678]
We introduce Continuous Autoregressive Language Models (CALM)<n>CALM uses a high-fidelity autoencoder to compress a chunk of K tokens into a single continuous vector.<n>We develop a comprehensive likelihood-free framework that enables robust training, evaluation, and controllable sampling.
arXiv Detail & Related papers (2025-10-31T17:58:11Z) - Continuously Augmented Discrete Diffusion model for Categorical Generative Modeling [87.34677262370924]
Standard discrete diffusion models treat all unobserved states identically by mapping them to an absorbing [MASK] token.<n>This creates an 'information void' where semantic information that could be inferred from unmasked tokens is lost between denoising steps.<n>We introduce Continuously Augmented Discrete Diffusion, a framework that augments the discrete state space with a paired diffusion in a continuous latent space.
arXiv Detail & Related papers (2025-10-01T18:00:56Z) - Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenizations [83.93566096400723]
We find that instruction-tuned models retain up to 93.4% of their original performance when given a randomly sampled tokenization.<n>Character-level segmentation improves string manipulation and code understanding tasks by up to +14%.<n>Right-aligned digit grouping enhances large-number arithmetic by +33%.
arXiv Detail & Related papers (2025-06-23T18:02:26Z) - Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation [85.82112629564942]
We propose TokenBridge, which maintains the strong representation capacity of continuous tokens while preserving the modeling simplicity of discrete tokens.<n>We introduce a dimension-wise quantization strategy that independently discretizes each feature dimension, paired with a lightweight autoregressive prediction mechanism.<n>Our approach achieves reconstruction and generation quality on par with continuous methods while using standard categorical prediction.
arXiv Detail & Related papers (2025-03-20T17:59:59Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.