Related papers: ParaScopes: What do Language Models Activations Encode About Future Text?

ParaScopes: What do Language Models Activations Encode About Future Text?

URL: http://arxiv.org/abs/2511.00180v1
Date: Fri, 31 Oct 2025 18:36:10 GMT
Title: ParaScopes: What do Language Models Activations Encode About Future Text?
Authors: Nicky Pochinkov, Yulia Volkova, Anna Vasileva, Sai V R Chereddy,
Abstract summary: Interpretability studies in language models often investigate forward-looking representations of activations.<n>We develop a framework of Residual Stream Decoders as a method of probing model activations for paragraph-scale and document-scale plans.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Interpretability studies in language models often investigate forward-looking representations of activations. However, as language models become capable of doing ever longer time horizon tasks, methods for understanding activations often remain limited to testing specific concepts or tokens. We develop a framework of Residual Stream Decoders as a method of probing model activations for paragraph-scale and document-scale plans. We test several methods and find information can be decoded equivalent to 5+ tokens of future context in small models. These results lay the groundwork for better monitoring of language models and better understanding how they might encode longer-term planning information.

Related papers

xVLM2Vec: Adapting LVLM-based embedding models to multilinguality using Self-Knowledge Distillation [2.9998889086656586]
We propose an adaptation methodology for Large Vision-Language Models trained on English language data to improve their performance.<n>We introduce a benchmark to evaluate the effectiveness of multilingual and multimodal embedding models.
arXiv Detail & Related papers (2025-03-12T12:04:05Z)
Learning to Plan for Language Modeling from Unlabeled Data [23.042650737356496]
We train a module for planning the future writing process via a self-supervised learning objective. Given the textual context, this planning module learns to predict future abstract writing actions, which correspond to centroids in a clustered text embedding space.
arXiv Detail & Related papers (2024-03-31T09:04:01Z)
Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings. An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts) This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z)
Evaluating Large Language Models on Controlled Generation Tasks [92.64781370921486]
We present an extensive analysis of various benchmarks including a sentence planning benchmark with different granularities. After comparing large language models against state-of-the-start finetuned smaller models, we present a spectrum showing large language models falling behind, are comparable, or exceed the ability of smaller models.
arXiv Detail & Related papers (2023-10-23T03:48:24Z)
L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs) We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods. In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z)
Wave to Syntax: Probing spoken language models for syntax [16.643072915927313]
We focus on the encoding of syntax in several self-supervised and visually grounded models of spoken language. We show that syntax is captured most prominently in the middle layers of the networks, and more explicitly within models with more parameters.
arXiv Detail & Related papers (2023-05-30T11:43:18Z)
Multi-lingual Evaluation of Code Generation Models [82.7357812992118]
We present new benchmarks on evaluation code generation models: MBXP and Multilingual HumanEval, and MathQA-X. These datasets cover over 10 programming languages. We are able to assess the performance of code generation models in a multi-lingual fashion.
arXiv Detail & Related papers (2022-10-26T17:17:06Z)
Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z)
Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm [0.0]
We discuss methods of prompt programming, emphasizing the usefulness of considering prompts through the lens of natural language. We introduce the idea of a metaprompt that seeds the model to generate its own natural language prompts for a range of tasks.
arXiv Detail & Related papers (2021-02-15T05:27:55Z)
InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training [135.12061144759517]
We present an information-theoretic framework that formulates cross-lingual language model pre-training. We propose a new pre-training task based on contrastive learning. By leveraging both monolingual and parallel corpora, we jointly train the pretext to improve the cross-lingual transferability of pre-trained models.
arXiv Detail & Related papers (2020-07-15T16:58:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.