Sparse Neurons Carry Strong Signals of Question Ambiguity in LLMs
- URL: http://arxiv.org/abs/2509.13664v1
- Date: Wed, 17 Sep 2025 03:34:35 GMT
- Title: Sparse Neurons Carry Strong Signals of Question Ambiguity in LLMs
- Authors: Zhuoxuan Zhang, Jinhao Duan, Edward Kim, Kaidi Xu,
- Abstract summary: We show that question ambiguity is linearly encoded in the internal representations of large language models (LLMs)<n>We show that LLMs form compact internal representations of question ambiguity, enabling interpretable and controllable behavior.
- Score: 23.900061215331338
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ambiguity is pervasive in real-world questions, yet large language models (LLMs) often respond with confident answers rather than seeking clarification. In this work, we show that question ambiguity is linearly encoded in the internal representations of LLMs and can be both detected and controlled at the neuron level. During the model's pre-filling stage, we identify that a small number of neurons, as few as one, encode question ambiguity information. Probes trained on these Ambiguity-Encoding Neurons (AENs) achieve strong performance on ambiguity detection and generalize across datasets, outperforming prompting-based and representation-based baselines. Layerwise analysis reveals that AENs emerge from shallow layers, suggesting early encoding of ambiguity signals in the model's processing pipeline. Finally, we show that through manipulating AENs, we can control LLM's behavior from direct answering to abstention. Our findings reveal that LLMs form compact internal representations of question ambiguity, enabling interpretable and controllable behavior.
Related papers
- Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs [100.02824137397464]
We investigate how Large Language Models adapt their internal representations when encountering inputs of increasing difficulty.<n>We reveal a consistent and quantifiable phenomenon: as task difficulty increases, the last hidden states of LLMs become substantially sparser.<n>This sparsity--difficulty relation is observable across diverse models and domains.
arXiv Detail & Related papers (2026-03-03T18:48:15Z) - Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures [12.466522376751811]
Hyperdimensional Probe is a novel paradigm for decoding information from the Large Language Models vector space.<n>It combines ideas from symbolic representations and neural probing to project the model's residual stream into interpretable concepts.<n>Our work advances information decoding in LLM vector space, enabling extracting more informative, interpretable, and structured features from neural representations.
arXiv Detail & Related papers (2025-09-29T16:59:07Z) - When Truthful Representations Flip Under Deceptive Instructions? [28.51629358895544]
Large language models (LLMs) tend to follow maliciously crafted instructions to generate deceptive responses.<n>Deceptive instructions alter the internal representations of LLM compared to truthful ones.<n>Our analysis pinpoints layer-wise and feature-level correlates of instructed dishonesty.
arXiv Detail & Related papers (2025-07-29T18:27:13Z) - Probing Neural Topology of Large Language Models [12.298921317333452]
We introduce graph probing, a method for uncovering the functional connectivity of large language models.<n>By probing models across diverse LLM families and scales, we discover a universal predictability of next-token prediction performance.<n>Strikingly, probing on topology outperforms probing on activation by up to 130.4%.
arXiv Detail & Related papers (2025-06-01T14:57:03Z) - Factual Self-Awareness in Language Models: Representation, Robustness, and Scaling [56.26834106704781]
Factual incorrectness in generated content is one of the primary concerns in ubiquitous deployment of large language models (LLMs)<n>We provide evidence supporting the presence of LLMs' internal compass that dictate the correctness of factual recall at the time of generation.<n>Scaling experiments across model sizes and training dynamics highlight that self-awareness emerges rapidly during training and peaks in intermediate layers.
arXiv Detail & Related papers (2025-05-27T16:24:02Z) - HalluShift: Measuring Distribution Shifts towards Hallucination Detection in LLMs [14.005452985740849]
Large Language Models (LLMs) have recently garnered widespread attention due to their adeptness at generating innovative responses to the given prompts.<n>In this work, we hypothesize that hallucinations stem from the internal dynamics of LLMs.<n>We introduce an innovative approach, HalluShift, designed to analyze the distribution shifts in the internal state space.
arXiv Detail & Related papers (2025-04-13T08:35:22Z) - Emergent Symbol-like Number Variables in Artificial Neural Networks [34.388552536773034]
We show that we can interpret raw NN activity through the lens of simplified Symbolic Algorithms (SAs)<n>We extend the DAS framework to a broader class of alignment functions that more flexibly capture NN activity in terms of interpretable variables from SAs.<n>We show that recurrent models can develop graded, symbol-like number variables in their neural activity.
arXiv Detail & Related papers (2025-01-10T18:03:46Z) - Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Ambiguous Prompts and Unanswerable Questions [60.31496362993982]
Large language models (LLMs) frequently generate confident yet inaccurate responses.<n>We present a novel, test-time approach to detecting model hallucination through systematic analysis of information flow.
arXiv Detail & Related papers (2024-12-13T16:14:49Z) - Crafting Interpretable Embeddings by Asking LLMs Questions [89.49960984640363]
Large language models (LLMs) have rapidly improved text embeddings for a growing array of natural-language processing tasks.
We introduce question-answering embeddings (QA-Emb), embeddings where each feature represents an answer to a yes/no question asked to an LLM.
We use QA-Emb to flexibly generate interpretable models for predicting fMRI voxel responses to language stimuli.
arXiv Detail & Related papers (2024-05-26T22:30:29Z) - ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks [91.55895047448249]
This paper presents ReEval, an LLM-based framework using prompt chaining to perturb the original evidence for generating new test cases.
We implement ReEval using ChatGPT and evaluate the resulting variants of two popular open-domain QA datasets.
Our generated data is human-readable and useful to trigger hallucination in large language models.
arXiv Detail & Related papers (2023-10-19T06:37:32Z) - The Curious Case of Hallucinatory (Un)answerability: Finding Truths in
the Hidden States of Over-Confident Large Language Models [46.990141872509476]
We study the behavior of large language models (LLMs) when presented with (un)answerable queries.
Our results show strong indications that such models encode the answerability of an input query, with the representation of the first decoded token often being a strong indicator.
arXiv Detail & Related papers (2023-10-18T11:01:09Z) - Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models [124.90671698586249]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of downstream tasks.<n>LLMs occasionally generate content that diverges from the user input, contradicts previously generated context, or misaligns with established world knowledge.
arXiv Detail & Related papers (2023-09-03T16:56:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.