Related papers: Exploring the Hidden Capacity of LLMs for One-Step Text Generation

Exploring the Hidden Capacity of LLMs for One-Step Text Generation

URL: http://arxiv.org/abs/2505.21189v2
Date: Sat, 01 Nov 2025 10:01:56 GMT
Title: Exploring the Hidden Capacity of LLMs for One-Step Text Generation
Authors: Gleb Mezentsev, Ivan Oseledets,
Abstract summary: We show that frozen large language models can generate hundreds of accurate tokens in just one token-parallel forward pass.<n>Although these representations are not unique for a given text, they form connected and local regions in embedding space.<n>We also empirically show that, although these representations are not unique for a given text, they form connected and local regions in embedding space.
Score: 3.5785385789441158
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A recent study showed that large language models (LLMs) can reconstruct surprisingly long texts - up to thousands of tokens - via autoregressive generation from just one trained input embedding. In this work, we explore whether autoregressive decoding is essential for such reconstruction. We show that frozen LLMs can generate hundreds of accurate tokens in just one token-parallel forward pass, when provided with only two learned embeddings. This reveals a surprising and underexplored multi-token generation capability of autoregressive LLMs. We examine these embeddings and characterize the information they encode. We also empirically show that, although these representations are not unique for a given text, they form connected and local regions in embedding space - suggesting the potential to train a practical encoder. The existence of such representations hints that multi-token generation may be natively accessible in off-the-shelf LLMs via a learned input encoder, eliminating heavy retraining and helping to overcome the fundamental bottleneck of autoregressive decoding while reusing already-trained models.

Related papers

Learning to Compress: Unlocking the Potential of Large Language Models for Text Representation [34.21806963402883]
We study the untapped potential of context compression as a pretext task for unsupervised adaptation of large language models (LLMs)<n> Experiments demonstrate that a well-designed compression objective can significantly enhance LLM-based text representations.<n>Further improvements through contrastive learning produce a strong representation model (LLM2Comp)
arXiv Detail & Related papers (2025-11-21T10:45:44Z)
Rep2Text: Decoding Full Text from a Single LLM Token Representation [38.62008454909388]
We propose a novel framework for decoding full text from last-token representations.<n>Rep2Text employs a trainable adapter that projects a target model's internal representations into the embedding space of a decoding language model.
arXiv Detail & Related papers (2025-11-09T23:18:36Z)
Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures [12.466522376751811]
Hyperdimensional Probe is a novel paradigm for decoding information from the Large Language Models vector space.<n>It combines ideas from symbolic representations and neural probing to project the model's residual stream into interpretable concepts.<n>Our work advances information decoding in LLM vector space, enabling extracting more informative, interpretable, and structured features from neural representations.
arXiv Detail & Related papers (2025-09-29T16:59:07Z)
Causal2Vec: Improving Decoder-only LLMs as Versatile Embedding Models [22.02568434890804]
Causal2Vec is a general-purpose embedding model tailored to enhance the performance of decoder-only large language models.<n>We first employ a lightweight BERT-style model to pre-encode the input text into a single Contextual token.<n>To mitigate the recency bias by last-token pooling, we introduced the last hidden states of Contextual and EOS tokens as the final text embedding.
arXiv Detail & Related papers (2025-07-31T10:01:11Z)
Memory Tokens: Large Language Models Can Generate Reversible Sentence Embeddings [0.0]
reversible sentence embeddings allow an LLM to reconstruct the original text exactly, without modifying the model's weights.<n>We evaluate this phenomenon across English and Spanish datasets, sequences of up to approximately 240 tokens, and model scales ranging from 100M to 8B parameters.
arXiv Detail & Related papers (2025-06-17T22:13:34Z)
GEM: Empowering LLM for both Embedding Generation and Language Understanding [11.081595808236239]
We propose Generative Embedding large language Model (GEM) to generate high-quality text embeddings.<n>Our method inserts new special token(s) into a text body, and generates summarization embedding of the text by manipulating the attention mask.<n>Our results indicate that our approach can empower LLMs with state-of-the-art text embedding capabilities while maintaining their original NLP performance.
arXiv Detail & Related papers (2025-06-04T18:02:07Z)
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens [66.02261367232256]
Multimodal Large Language Models (MLLMs) aim to unify visual comprehension and generation.<n>Existing approaches rely on spatial visual tokens, where image patches are encoded and arranged according to a spatial order.<n>In this paper, we build a proper visual language by reconstructing diffusion timesteps to learn discrete visual tokens.
arXiv Detail & Related papers (2025-04-20T16:14:28Z)
Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models [52.439289085318634]
We show how to identify training data known to proprietary large language models (LLMs) by using information-guided probes.<n>Our work builds on a key observation: text passages with high surprisal are good search material for memorization probes.
arXiv Detail & Related papers (2025-03-15T10:19:15Z)
Idiosyncrasies in Large Language Models [54.26923012617675]
We unveil and study idiosyncrasies in Large Language Models (LLMs)<n>We find that fine-tuning existing text embedding models on LLM-generated texts yields excellent classification accuracy.<n>We leverage LLM as judges to generate detailed, open-ended descriptions of each model's idiosyncrasies.
arXiv Detail & Related papers (2025-02-17T18:59:02Z)
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning [53.57895922042783]
Large Language Models (LLMs) excel at reasoning and planning when trained on chainof-thought (CoT) data.<n>We propose a hybrid representation of the reasoning process, where we partially abstract away the initial reasoning steps using latent discrete tokens.
arXiv Detail & Related papers (2025-02-05T15:33:00Z)
Efficient Real-time Refinement of Language Model Text Generation [65.1937138219008]
Large language models (LLMs) have shown remarkable performance across a wide range of natural language tasks.<n>A critical challenge remains in that they sometimes generate factually incorrect answers.<n>We propose Streaming-VR, a novel approach designed to enhance the efficiency of verification and refinement of LLM outputs.
arXiv Detail & Related papers (2025-01-14T03:59:48Z)
Language Models can Self-Lengthen to Generate Long Texts [74.96074422345806]
This paper introduces an innovative iterative training framework called Self-Lengthen. It leverages only the intrinsic knowledge and skills of Large Language Models without the need for auxiliary data or proprietary models. Experiments on benchmarks and human evaluations show that Self-Lengthen outperforms existing methods in long-text generation.
arXiv Detail & Related papers (2024-10-31T13:47:10Z)
FIRP: Faster LLM inference via future intermediate representation prediction [54.897493351694195]
FIRP generates multiple tokens instead of one at each decoding step. We conduct extensive experiments, showing a speedup ratio of 1.9x-3x in several models and datasets.
arXiv Detail & Related papers (2024-10-27T15:53:49Z)
CUTE: Measuring LLMs' Understanding of Their Tokens [54.70665106141121]
Large Language Models (LLMs) show remarkable performance on a wide variety of tasks. This raises the question: To what extent can LLMs learn orthographic information? We propose a new benchmark, which features a collection of tasks designed to test the orthographic knowledge of LLMs.
arXiv Detail & Related papers (2024-09-23T18:27:03Z)
Identifying the Source of Generation for Large Language Models [21.919661430250798]
Large language models (LLMs) memorize text from several sources of documents. LLMs can not provide document information on the generated content. This work introduces token-level source identification in the decoding step.
arXiv Detail & Related papers (2024-07-05T08:52:15Z)
A Text is Worth Several Tokens: Text Embedding from LLMs Secretly Aligns Well with The Key Tokens [20.37803751979975]
When feeding a text into a large language model-based embedder, the obtained text embedding will be able to be aligned with the key tokens in the input text.<n>We show that this phenomenon is universal and is not affected by model architecture, training strategy, and embedding method.
arXiv Detail & Related papers (2024-06-25T08:55:12Z)
Peering into the Mind of Language Models: An Approach for Attribution in Contextual Question Answering [9.86691461253151]
We introduce a novel method for attribution in contextual question answering, leveraging the hidden state representations of large language models (LLMs) Our approach bypasses the need for extensive model retraining and retrieval model overhead, offering granular attributions and preserving the quality of generated answers. We present Verifiability-granular, an attribution dataset which has token level annotations for LLM generations in the contextual question answering setup.
arXiv Detail & Related papers (2024-05-28T09:12:44Z)
CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code [56.019447113206006]
Large Language Models (LLMs) have achieved remarkable progress in code generation.<n>CodeIP is a novel multi-bit watermarking technique that inserts additional information to preserve provenance details.<n>Experiments conducted on a real-world dataset across five programming languages demonstrate the effectiveness of CodeIP.
arXiv Detail & Related papers (2024-04-24T04:25:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.