Mechanistic Interpretability of Cognitive Complexity in LLMs via Linear Probing using Bloom's Taxonomy
- URL: http://arxiv.org/abs/2602.17229v1
- Date: Thu, 19 Feb 2026 10:19:04 GMT
- Title: Mechanistic Interpretability of Cognitive Complexity in LLMs via Linear Probing using Bloom's Taxonomy
- Authors: Bianca Raimondi, Maurizio Gabbrielli,
- Abstract summary: This study investigates the internal neural representations of cognitive complexity using Bloom's taxonomy as a hierarchical lens.<n>We probe whether different cognitive levels, ranging from basic recall (Remember) to abstract synthesis (Create) are linearly separable within the model's residual streams.
- Score: 0.8665758002017515
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The black-box nature of Large Language Models necessitates novel evaluation frameworks that transcend surface-level performance metrics. This study investigates the internal neural representations of cognitive complexity using Bloom's Taxonomy as a hierarchical lens. By analyzing high-dimensional activation vectors from different LLMs, we probe whether different cognitive levels, ranging from basic recall (Remember) to abstract synthesis (Create), are linearly separable within the model's residual streams. Our results demonstrate that linear classifiers achieve approximately 95% mean accuracy across all Bloom levels, providing strong evidence that cognitive level is encoded in a linearly accessible subspace of the model's representations. These findings provide evidence that the model resolves the cognitive difficulty of a prompt early in the forward pass, with representations becoming increasingly separable across layers.
Related papers
- Towards LLM-Empowered Knowledge Tracing via LLM-Student Hierarchical Behavior Alignment in Hyperbolic Space [24.868649493405528]
Knowledge Tracing (KT) diagnoses students' concept mastery through continuous learning state monitoring in education.<n>Existing methods rely on ID-based sequences or shallow textual features.<n>This paper proposes a Large Language Model Hyperbolic Aligned Knowledge Tracing framework.
arXiv Detail & Related papers (2026-02-26T11:17:31Z) - Detecting Hallucinations in Retrieval-Augmented Generation via Semantic-level Internal Reasoning Graph [12.233570103035312]
We propose a semantic-level internal reasoning graph-based method for detecting faithfulness hallucinations.<n>Our method achieves better overall performance compared to state-of-the-art baselines on RAGTruth and Dolly-15k.
arXiv Detail & Related papers (2026-01-06T14:35:20Z) - A novel hallucination classification framework [0.0]
This work introduces a novel methodology for the automatic detection of hallucinations generated during large language model (LLM) inference.<n>The proposed approach is based on a systematic taxonomy and controlled reproduction of diverse hallucination types through prompt engineering.
arXiv Detail & Related papers (2025-10-06T09:54:20Z) - Probing Neural Topology of Large Language Models [12.298921317333452]
We introduce graph probing, a method for uncovering the functional connectivity of large language models.<n>By probing models across diverse LLM families and scales, we discover a universal predictability of next-token prediction performance.<n>Strikingly, probing on topology outperforms probing on activation by up to 130.4%.
arXiv Detail & Related papers (2025-06-01T14:57:03Z) - Concept-Guided Interpretability via Neural Chunking [64.6429903327095]
We show that neural networks exhibit patterns in their raw population activity that mirror regularities in the training data.<n>We propose three methods to extract recurring chunks on a neural population level.<n>Our work points to a new direction for interpretability, one that harnesses both cognitive principles and the structure of naturalistic data.
arXiv Detail & Related papers (2025-05-16T13:49:43Z) - In-Context Linear Regression Demystified: Training Dynamics and Mechanistic Interpretability of Multi-Head Softmax Attention [52.159541540613915]
We study how multi-head softmax attention models are trained to perform in-context learning on linear data.<n>Our results reveal that in-context learning ability emerges from the trained transformer as an aggregated effect of its architecture and the underlying data distribution.
arXiv Detail & Related papers (2025-03-17T02:00:49Z) - I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [76.15163242945813]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence.<n>We introduce a novel generative model that generates tokens on the basis of human-interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z) - Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network.
Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z) - Probing Ranking LLMs: A Mechanistic Analysis for Information Retrieval [20.353393773305672]
We employ a probing-based analysis to examine neuron activations in ranking LLMs.<n>Our study spans a broad range of feature categories, including lexical signals, document structure, query-document interactions, and complex semantic representations.<n>Our findings offer crucial insights for developing more transparent and reliable retrieval systems.
arXiv Detail & Related papers (2024-10-24T08:20:10Z) - PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics [51.17512229589]
PoLLMgraph is a model-based white-box detection and forecasting approach for large language models.
We show that hallucination can be effectively detected by analyzing the LLM's internal state transition dynamics.
Our work paves a new way for model-based white-box analysis of LLMs, motivating the research community to further explore, understand, and refine the intricate dynamics of LLM behaviors.
arXiv Detail & Related papers (2024-04-06T20:02:20Z) - Finding Alignments Between Interpretable Causal Variables and
Distributed Neural Representations [62.65877150123775]
Causal abstraction is a promising theoretical framework for explainable artificial intelligence.
Existing causal abstraction methods require a brute-force search over alignments between the high-level model and the low-level one.
We present distributed alignment search (DAS), which overcomes these limitations.
arXiv Detail & Related papers (2023-03-05T00:57:49Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.