Probing Neural Topology of Large Language Models
- URL: http://arxiv.org/abs/2506.01042v2
- Date: Thu, 25 Sep 2025 20:23:14 GMT
- Title: Probing Neural Topology of Large Language Models
- Authors: Yu Zheng, Yuan Yuan, Yue Zhuo, Yong Li, Paolo Santi,
- Abstract summary: We introduce graph probing, a method for uncovering the functional connectivity of large language models.<n>By probing models across diverse LLM families and scales, we discover a universal predictability of next-token prediction performance.<n>Strikingly, probing on topology outperforms probing on activation by up to 130.4%.
- Score: 12.298921317333452
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Probing large language models (LLMs) has yielded valuable insights into their internal mechanisms by linking neural activations to interpretable semantics. However, the complex mechanisms that link neuron's functional co-activation with the emergent model capabilities remains largely unknown, hindering a deeper understanding and safer development of LLMs. In this work, we introduce graph probing, a method for uncovering the functional connectivity of LLM neurons and relating it to language generation performance. By probing models across diverse LLM families and scales, we discover a universal predictability of next-token prediction performance using only neural topology, which persists even when retaining just 1% of neuron connections. Strikingly, probing on topology outperforms probing on activation by up to 130.4%, suggesting that neural topology contains orders of richer information of LLM performance than neural activation, which can be easily extracted with simple linear or MLP probes. To explain the dependence between neural topology and language performance, we identify default networks and hub neurons in LLMs and provide causal evidence by interventional experiments on multiple benchmarks, showing that LLMs actually exploit these topological information. Further analyses suggest that neural topology can be effectively leveraged to improve the efficiency, reliability, and safety of LLMs through proof-of-concept applications in model pruning, hallucination detection, and LLM fingerprinting. Codes and data for the graph probing toolbox are available at https://github.com/DavyMorgan/llm-graph-probing.
Related papers
- Do LLMs and VLMs Share Neurons for Inference? Evidence and Mechanisms of Cross-Modal Transfer [65.72553715508691]
We show that large vision-language models (LVLMs) lag behind strong text-only large language models (LLMs) on tasks that require multi-step inference and compositional decision-making.<n>We propose Shared Neuron Low-Rank Fusion (SNRF), a parameter-efficient framework that transfers mature inference circuitry from LLMs to LVLMs.<n>Our results demonstrate that shared neurons form an interpretable bridge between LLMs and LVLMs, enabling low-cost transfer of inference ability into multimodal models.
arXiv Detail & Related papers (2026-02-22T06:04:05Z) - NeuroMambaLLM: Dynamic Graph Learning of fMRI Functional Connectivity in Autistic Brains Using Mamba and Language Model Reasoning [0.0]
We propose NeuroMambaLLM, an end-to-end framework that integrates dynamic latent graph learning and selective state-space temporal modelling with Large Language Models (LLMs)<n>The proposed method learns the functional connectivity dynamically from raw Blood-Oxygen-Level-Dependent (BOLD) time series, replacing fixed correlation graphs with adaptive latent connectivity while suppressing motion-related artifacts and capturing long-range temporal dependencies.<n>This design enables the LLM to perform both diagnostic classification and language-based reasoning, allowing it to analyze dynamic fMRI patterns and generate clinically meaningful textual reports.
arXiv Detail & Related papers (2026-02-14T13:32:59Z) - Pruning Large Language Models by Identifying and Preserving Functional Networks [41.601762545495255]
Structured pruning is a technique for compressing large language models (LLMs) to reduce GPU memory consumption and accelerate inference speed.<n>Most structured pruning methods overlook the interaction and collaboration among artificial neurons that are crucial for the functionalities of LLMs.<n>Inspired by the inherent similarities between artificial neural networks and functional neural networks in the human brain, we propose to prune LLMs by identifying and preserving functional networks.
arXiv Detail & Related papers (2025-08-07T10:27:01Z) - NLP4Neuro: Sequence-to-sequence learning for neural population decoding [0.9086712846902969]
Delineating how animal behavior arises from neural activity is a foundational goal of neuroscience.<n>Transformers, the backbone of modern large language models (LLMs), have become powerful tools for neural decoding from smaller neural populations.
arXiv Detail & Related papers (2025-07-03T03:14:55Z) - Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization [17.101290138120564]
Current methods rely on dictionary learning with sparse autoencoders (SAEs)<n>Here, we tackle these limitations by directly decomposing activations with semi-nonnegative matrix factorization (SNMF)<n>Experiments on Llama 3.1, Gemma 2 and GPT-2 show that SNMF derived features outperform SAEs and a strong supervised baseline (difference-in-means) on causal steering.
arXiv Detail & Related papers (2025-06-12T17:33:29Z) - Brain-Inspired Exploration of Functional Networks and Key Neurons in Large Language Models [53.91412558475662]
We use methods similar to those in the field of functional neuroimaging analysis to locate and identify functional networks in large language models (LLMs)<n> Experimental results show that, similar to the human brain, LLMs contain functional networks that frequently recur during operation.<n>Masking key functional networks significantly impairs the model's performance, while retaining just a subset is adequate to maintain effective operation.
arXiv Detail & Related papers (2025-02-13T04:42:39Z) - Explainability in Neural Networks for Natural Language Processing Tasks [5.812284760539713]
Local Interpretable Model-Agnostic Explanations (LIME) have emerged as essential tools for providing insights into the behavior of complex systems.<n>This study leverages LIME to interpret a multi-layer perceptron (MLP) neural network trained on a text classification task.<n>Despite its effectiveness in offering localized explanations, LIME has limitations in capturing global patterns and feature interactions.
arXiv Detail & Related papers (2024-12-23T23:09:56Z) - Interpretable Language Modeling via Induction-head Ngram Models [74.26720927767398]
We propose Induction-head ngram models (Induction-Gram) to bolster modern ngram models with a hand-engineered "induction head"
This induction head uses a custom neural similarity metric to efficiently search the model's input context for potential next-word completions.
Experiments show that this simple method significantly improves next-word prediction over baseline interpretable models.
arXiv Detail & Related papers (2024-10-31T12:33:26Z) - Improving Neuron-level Interpretability with White-box Language Models [11.898535906016907]
We introduce a white-box transformer-like architecture named Coding RAte TransformEr (CRATE)<n>Our comprehensive experiments showcase significant improvements (up to 103% relative improvement) in neuron-level interpretability.<n>CRATE's increased interpretability comes from its enhanced ability to consistently and distinctively activate on relevant tokens.
arXiv Detail & Related papers (2024-10-21T19:12:33Z) - Analysis of Argument Structure Constructions in a Deep Recurrent Language Model [0.0]
We explore the representation and processing of Argument Structure Constructions (ASCs) in a recurrent neural language model.
Our results show that sentence representations form distinct clusters corresponding to the four ASCs across all hidden layers.
This indicates that even a relatively simple, brain-constrained recurrent neural network can effectively differentiate between various construction types.
arXiv Detail & Related papers (2024-08-06T09:27:41Z) - Nonlinear classification of neural manifolds with contextual information [6.292933471495322]
We introduce a theoretical framework that leverages latent directions in input space, which can be related to contextual information.<n>We derive an exact formula for the context-dependent manifold capacity that depends on manifold geometry and context correlations.<n>Our framework's increased expressivity captures representation reformatting in deep networks at early stages of the layer hierarchy, previously inaccessible to analysis.
arXiv Detail & Related papers (2024-05-10T23:37:31Z) - Characterizing Truthfulness in Large Language Model Generations with
Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs)
We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z) - In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL)
We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z) - How Graph Neural Networks Learn: Lessons from Training Dynamics [80.41778059014393]
We study the training dynamics in function space of graph neural networks (GNNs)
We find that the gradient descent optimization of GNNs implicitly leverages the graph structure to update the learned function.
This finding offers new interpretable insights into when and why the learned GNN functions generalize.
arXiv Detail & Related papers (2023-10-08T10:19:56Z) - N2G: A Scalable Approach for Quantifying Interpretable Neuron
Representations in Large Language Models [0.0]
N2G is a tool which takes a neuron and its dataset examples, and automatically distills the neuron's behaviour on those examples to an interpretable graph.
We use truncation and saliency methods to only present the important tokens, and augment the dataset examples with more diverse samples to better capture the extent of neuron behaviour.
These graphs can be visualised to aid manual interpretation by researchers, but can also output token activations on text to compare to the neuron's ground truth activations for automatic validation.
arXiv Detail & Related papers (2023-04-22T19:06:13Z) - The Causal Neural Connection: Expressiveness, Learnability, and
Inference [125.57815987218756]
An object called structural causal model (SCM) represents a collection of mechanisms and sources of random variation of the system under investigation.
In this paper, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020) still holds for neural models.
We introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences.
arXiv Detail & Related papers (2021-07-02T01:55:18Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - The Neural Coding Framework for Learning Generative Models [91.0357317238509]
We propose a novel neural generative model inspired by the theory of predictive processing in the brain.
In a similar way, artificial neurons in our generative model predict what neighboring neurons will do, and adjust their parameters based on how well the predictions matched reality.
arXiv Detail & Related papers (2020-12-07T01:20:38Z) - Analyzing Individual Neurons in Pre-trained Language Models [41.07850306314594]
We find small subsets of neurons to predict linguistic tasks, with lower level tasks localized in fewer neurons, compared to higher level task of predicting syntax.
For example, we found neurons in XLNet to be more localized and disjoint when predicting properties compared to BERT and others, where they are more distributed and coupled.
arXiv Detail & Related papers (2020-10-06T13:17:38Z) - How Neural Networks Extrapolate: From Feedforward to Graph Neural
Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution.
Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z) - Closed Loop Neural-Symbolic Learning via Integrating Neural Perception,
Grammar Parsing, and Symbolic Reasoning [134.77207192945053]
Prior methods learn the neural-symbolic models using reinforcement learning approaches.
We introduce the textbfgrammar model as a textitsymbolic prior to bridge neural perception and symbolic reasoning.
We propose a novel textbfback-search algorithm which mimics the top-down human-like learning procedure to propagate the error.
arXiv Detail & Related papers (2020-06-11T17:42:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.