Related papers: Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures

Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures

URL: http://arxiv.org/abs/2509.25045v1
Date: Mon, 29 Sep 2025 16:59:07 GMT
Title: Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures
Authors: Marco Bronzini, Carlo Nicolini, Bruno Lepri, Jacopo Staiano, Andrea Passerini,
Abstract summary: Hyperdimensional Probe is a novel paradigm for decoding information from the Large Language Models vector space.<n>It combines ideas from symbolic representations and neural probing to project the model's residual stream into interpretable concepts.<n>Our work advances information decoding in LLM vector space, enabling extracting more informative, interpretable, and structured features from neural representations.
Score: 12.466522376751811
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite their capabilities, Large Language Models (LLMs) remain opaque with limited understanding of their internal representations. Current interpretability methods, such as direct logit attribution (DLA) and sparse autoencoders (SAEs), provide restricted insight due to limitations such as the model's output vocabulary or unclear feature names. This work introduces Hyperdimensional Probe, a novel paradigm for decoding information from the LLM vector space. It combines ideas from symbolic representations and neural probing to project the model's residual stream into interpretable concepts via Vector Symbolic Architectures (VSAs). This probe combines the strengths of SAEs and conventional probes while overcoming their key limitations. We validate our decoding paradigm with controlled input-completion tasks, probing the model's final state before next-token prediction on inputs spanning syntactic pattern recognition, key-value associations, and abstract inference. We further assess it in a question-answering setting, examining the state of the model both before and after text generation. Our experiments show that our probe reliably extracts meaningful concepts across varied LLMs, embedding sizes, and input domains, also helping identify LLM failures. Our work advances information decoding in LLM vector space, enabling extracting more informative, interpretable, and structured features from neural representations.

Related papers

Step-Level Sparse Autoencoder for Reasoning Process Interpretation [48.99201531966593]
Large Language Models (LLMs) have achieved strong complex reasoning capabilities through Chain-of-Thought (CoT) reasoning.<n>We propose step-level sparse autoencoder (SSAE), which serves as an analytical tool to disentangle different aspects of LLMs' reasoning steps into sparse features.<n> Experiments on multiple base models and reasoning tasks show the effectiveness of the extracted features.
arXiv Detail & Related papers (2026-03-03T14:25:02Z)
Learning to Compress: Unlocking the Potential of Large Language Models for Text Representation [34.21806963402883]
We study the untapped potential of context compression as a pretext task for unsupervised adaptation of large language models (LLMs)<n> Experiments demonstrate that a well-designed compression objective can significantly enhance LLM-based text representations.<n>Further improvements through contrastive learning produce a strong representation model (LLM2Comp)
arXiv Detail & Related papers (2025-11-21T10:45:44Z)
Towards Multimodal Understanding via Stable Diffusion as a Task-Aware Feature Extractor [32.34399128209528]
We study whether pre-trained text-to-image diffusion models can serve as instruction-aware visual encoders.<n>We find diffusion features are both rich in semantics and can encode strong image-text alignment.<n>We then investigate how to align these features with large language models and uncover a leakage phenomenon.
arXiv Detail & Related papers (2025-07-09T17:59:47Z)
When can isotropy help adapt LLMs' next word prediction to numerical domains? [53.98633183204453]
It is shown that the isotropic property of LLM embeddings in contextual embedding space preserves the underlying structure of representations.<n> Experiments show that different characteristics of numerical data and model architectures have different impacts on isotropy.
arXiv Detail & Related papers (2025-05-22T05:10:34Z)
Learning on LLM Output Signatures for gray-box Behavior Analysis [52.81120759532526]
Large Language Models (LLMs) have achieved widespread adoption, yet our understanding of their behavior remains limited.<n>We develop a transformer-based approach to process contamination and data detection in gray-box settings.<n>Our approach achieves superior performance on hallucination and data detection in gray-box settings, significantly outperforming existing baselines.
arXiv Detail & Related papers (2025-03-18T09:04:37Z)
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [76.15163242945813]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence.<n>We introduce a novel generative model that generates tokens on the basis of human-interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z)
Steered Generation via Gradient Descent on Sparse Features [1.534667887016089]
We modify the internal structure of large language models (LLMs) by training sparse autoencoders to learn a sparse representation of the query embedding.<n>We demonstrate that manipulating this sparse representation effectively transforms the output toward different stylistic and cognitive targets.
arXiv Detail & Related papers (2025-02-25T21:06:14Z)
LatentQA: Teaching LLMs to Decode Activations Into Natural Language [72.87064562349742]
We introduce LatentQA, the task of answering open-ended questions about model activations in natural language.<n>We propose Latent Interpretation Tuning (LIT), which finetunes a decoder LLM on a dataset of activations and associated question-answer pairs.<n>Our decoder also specifies a differentiable loss that we use to control models, such as debiasing models on stereotyped sentences and controlling the sentiment of generations.
arXiv Detail & Related papers (2024-12-11T18:59:33Z)
Vector-ICL: In-context Learning with Continuous Vector Representations [75.96920867382859]
Large language models (LLMs) have shown remarkable in-context learning capabilities on textual data.<n>We explore whether these capabilities can be extended to continuous vectors from diverse domains, obtained from black-box pretrained encoders.<n>In particular, we find that pretraining projectors with general language modeling objectives enables Vector-ICL.
arXiv Detail & Related papers (2024-10-08T02:25:38Z)
Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition [23.172469312225694]
We propose to utilize an instruction-tuned large language model (LLM) for guiding the text generation process in automatic speech recognition (ASR)<n>The proposed model is built on the joint CTC and attention architecture, with the LLM serving as a front-end feature extractor for the decoder.<n> Experimental results show that the proposed LLM-guided model achieves a relative gain of approximately 13% in word error rates across major benchmarks.
arXiv Detail & Related papers (2023-09-19T11:10:50Z)
Evaluating and Explaining Large Language Models for Code Using Syntactic Structures [74.93762031957883]
This paper introduces ASTxplainer, an explainability method specific to Large Language Models for code. At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes. We perform an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects.
arXiv Detail & Related papers (2023-08-07T18:50:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.