The Shape of Adversarial Influence: Characterizing LLM Latent Spaces with Persistent Homology
- URL: http://arxiv.org/abs/2505.20435v2
- Date: Thu, 09 Oct 2025 16:00:15 GMT
- Title: The Shape of Adversarial Influence: Characterizing LLM Latent Spaces with Persistent Homology
- Authors: Aideen Fay, Inés García-Redondo, Qiquan Wang, Haim Dubossarsky, Anthea Monod,
- Abstract summary: This study focuses on how adversarial inputs systematically affect the internal representation spaces of Large Language Models.<n>By quantifying the shape of activations and neuronal information flow, our architecture-agnostic framework reveals fundamental invariants of representational change.
- Score: 4.280045926995889
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing interpretability methods for Large Language Models (LLMs) often fall short by focusing on linear directions or isolated features, overlooking the high-dimensional, nonlinear, and relational geometry within model representations. This study focuses on how adversarial inputs systematically affect the internal representation spaces of LLMs, a topic which remains poorly understood. We propose persistent homology (PH), a tool from topological data analysis, as a principled framework to characterize the multi-scale dynamics within LLM activations. Using PH, we systematically analyze six state-of-the-art models under two distinct adversarial conditions, indirect prompt injection and backdoor fine-tuning, and identify a consistent topological signature of adversarial influence. Across architectures and model sizes, adversarial inputs induce ``topological compression'', where the latent space becomes structurally simpler, collapsing from varied, compact, small-scale features into fewer, dominant, and more dispersed large-scale ones. This topological signature is statistically robust across layers, highly discriminative, and provides interpretable insights into how adversarial effects emerge and propagate. By quantifying the shape of activations and neuronal information flow, our architecture-agnostic framework reveals fundamental invariants of representational change, offering a complementary perspective to existing interpretability methods.
Related papers
- Physics-based phenomenological characterization of cross-modal bias in multimodal models [11.525886296936413]
multimodal large language models (MLLMs) are breaking new ground in multimodal understanding, reasoning, and generation.<n>We argue that inconspicuous distortions arising from complex multimodal interaction dynamics can lead to systematic bias.
arXiv Detail & Related papers (2026-02-24T07:21:08Z) - Emergent Structured Representations Support Flexible In-Context Inference in Large Language Models [77.98801218316505]
Large language models (LLMs) exhibit emergent behaviors suggestive of human-like reasoning.<n>We investigate the internal processing of LLMs during in-context concept inference.
arXiv Detail & Related papers (2026-02-08T03:14:39Z) - Bridging Structure and Appearance: Topological Features for Robust Self-Supervised Segmentation [8.584363058858935]
Self-supervised semantic segmentation methods often fail when faced with appearance ambiguities.<n>We argue that this is due to an over-reliance on unstable, appearance-based features such as shadows, glare, and local textures.<n>We propose textbfGASeg, a novel framework that bridges appearance and geometry by leveraging stable topological information.
arXiv Detail & Related papers (2025-12-30T05:34:28Z) - Schoenfeld's Anatomy of Mathematical Reasoning by Language Models [56.656180566692946]
We adopt Schoenfeld's Episode Theory as an inductive, intermediate-scale lens and introduce ThinkARM (Anatomy of Reasoning in Models)<n>ThinkARM explicitly abstracts reasoning traces into functional reasoning steps such as Analysis, Explore, Implement, verify, etc.<n>We show that episode-level representations make reasoning steps explicit, enabling systematic analysis of how reasoning is structured, stabilized, and altered in modern language models.
arXiv Detail & Related papers (2025-12-23T02:44:25Z) - GeoGNN: Quantifying and Mitigating Semantic Drift in Text-Attributed Graphs [59.61242815508687]
Graph neural networks (GNNs) on text--attributed graphs (TAGs) encode node texts using pretrained language models (PLMs) and propagate these embeddings through linear neighborhood aggregation.<n>This work introduces a local PCA-based metric that measures the degree of semantic drift and provides the first quantitative framework to analyze how different aggregation mechanisms affect manifold structure.
arXiv Detail & Related papers (2025-11-12T06:48:43Z) - A Comparative Analysis of Contextual Representation Flow in State-Space and Transformer Architectures [27.45316137669387]
State Space Models (SSMs) have emerged as efficient alternatives to Transformer-Based Models (TBMs) for long-sequence processing.<n>We present the first unified, token- and layer-level analysis of representation propagation in SSMs and TBMs.<n>We find a key divergence: TBMs rapidly homogenize token representations, with diversity reemerging only in later layers, while SSMs preserve token uniqueness early but converge to homogenization deeper.
arXiv Detail & Related papers (2025-10-08T04:46:11Z) - Structural DID with ML: Theory, Simulation, and a Roadmap for Applied Research [3.0031348283981987]
Causal inference in observational panel data has become a central concern in economics,policy analysis, and the broader social sciences.<n>This paper proposes an innovative framework called S-DID that integrates structural identification with high-dimensional estimation.
arXiv Detail & Related papers (2025-07-21T03:57:42Z) - Latent Space Topology Evolution in Multilayer Perceptrons [0.26107298043931204]
This paper introduces a framework for interpreting the internal representations of Multilayer Perceptrons (MLPs)<n>We construct a simplicial tower, a sequence of simplicial complexes connected by simplicial maps, that captures how data evolves across network layers.
arXiv Detail & Related papers (2025-06-02T11:51:53Z) - Spectral Insights into Data-Oblivious Critical Layers in Large Language Models [7.486925126518052]
We introduce a data-oblivious approach to identify intrinsic critical layers in pre-fine-tuned language models.<n>We show that layers with significant shifts in representation space are also those most affected during fine-tuning.
arXiv Detail & Related papers (2025-05-31T04:21:39Z) - Empirical Investigation of Latent Representational Dynamics in Large Language Models: A Manifold Evolution Perspective [4.935224714809964]
This paper introduces the Dynamical Manifold Evolution Theory (DMET), a conceptual framework that models large language model (LLM) generation as a continuous trajectory evolving on a low-dimensional semantic manifold.<n>The theory characterizes latent dynamics through three interpretable metrics-state continuity ($C$), attractor compactness ($Q$), and topological persistence ($P$)-which jointly capture the smoothness, stability, and structure of representation evolution.
arXiv Detail & Related papers (2025-05-24T14:17:50Z) - PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing [49.243031514520794]
Large Language Models (LLMs) excel at capturing long-range signals due to their text-centric design.<n>PhysLLM achieves state-the-art accuracy and robustness, demonstrating superior generalization across lighting variations and motion scenarios.
arXiv Detail & Related papers (2025-05-06T15:18:38Z) - Hallucination Detection in LLMs with Topological Divergence on Attention Graphs [64.74977204942199]
Hallucination, i.e., generating factually incorrect content, remains a critical challenge for large language models.<n>We introduce TOHA, a TOpology-based HAllucination detector in the RAG setting.
arXiv Detail & Related papers (2025-04-14T10:06:27Z) - Sensitivity Meets Sparsity: The Impact of Extremely Sparse Parameter Patterns on Theory-of-Mind of Large Language Models [55.46269953415811]
We identify ToM-sensitive parameters and show that perturbing as little as 0.001% of these parameters significantly degrades ToM performance.<n>Our results have implications for enhancing model alignment, mitigating biases, and improving AI systems designed for human interaction.
arXiv Detail & Related papers (2025-04-05T17:45:42Z) - Model Hemorrhage and the Robustness Limits of Large Language Models [119.46442117681147]
Large language models (LLMs) demonstrate strong performance across natural language processing tasks, yet undergo significant performance degradation when modified for deployment.<n>We define this phenomenon as model hemorrhage - performance decline caused by parameter alterations and architectural changes.
arXiv Detail & Related papers (2025-03-31T10:16:03Z) - MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention [57.044719143401664]
Histopathology and transcriptomics are fundamental modalities in oncology, encapsulating the morphological and molecular aspects of the disease.<n>We present MIRROR, a novel multi-modal representation learning method designed to foster both modality alignment and retention.<n>Extensive evaluations on TCGA cohorts for cancer subtyping and survival analysis highlight MIRROR's superior performance.
arXiv Detail & Related papers (2025-03-01T07:02:30Z) - Unraveling the Localized Latents: Learning Stratified Manifold Structures in LLM Embedding Space with Sparse Mixture-of-Experts [3.9426000822656224]
We conjecture that in large language models, the embeddings live in a local manifold structure with different dimensions depending on the perplexities and domains of the input data.<n>By incorporating an attention-based soft-gating network, we verify that our model learns specialized sub-manifolds for an ensemble of input data sources.
arXiv Detail & Related papers (2025-02-19T09:33:16Z) - Interpreting token compositionality in LLMs: A robustness analysis [10.777646083061395]
Constituent-Aware Pooling (CAP) is a methodology designed to analyse how large language models process linguistic structures.<n>CAP intervenes in model activations through constituent-based pooling at various model levels.<n>Our findings highlight fundamental limitations in current transformer architectures regarding compositional semantics processing and model interpretability.
arXiv Detail & Related papers (2024-10-16T18:10:50Z) - Persistent Topological Features in Large Language Models [0.6597195879147556]
We introduce persistence similarity, a new metric that quantifies the persistence and transformation of topological features.
Unlike traditional similarity measures, our approach captures the entire evolutionary trajectory of these features.
As a practical application, we leverage persistence similarity to identify and prune redundant layers.
arXiv Detail & Related papers (2024-10-14T19:46:23Z) - SpaRC and SpaRP: Spatial Reasoning Characterization and Path Generation for Understanding Spatial Reasoning Capability of Large Language Models [70.01883340129204]
spatial reasoning is a crucial component of both biological and artificial intelligence.
We present a comprehensive study of the capability of current state-of-the-art large language models (LLMs) on spatial reasoning.
arXiv Detail & Related papers (2024-06-07T01:06:34Z) - A Theoretical Analysis of Self-Supervised Learning for Vision Transformers [66.08606211686339]
Masked autoencoders (MAE) and contrastive learning (CL) capture different types of representations.<n>We study the training dynamics of one-layer softmax-based vision transformers (ViTs) on both MAE and CL objectives.
arXiv Detail & Related papers (2024-03-04T17:24:03Z) - Sparsity-Guided Holistic Explanation for LLMs with Interpretable
Inference-Time Intervention [53.896974148579346]
Large Language Models (LLMs) have achieved unprecedented breakthroughs in various natural language processing domains.
The enigmatic black-box'' nature of LLMs remains a significant challenge for interpretability, hampering transparent and accountable applications.
We propose a novel methodology anchored in sparsity-guided techniques, aiming to provide a holistic interpretation of LLMs.
arXiv Detail & Related papers (2023-12-22T19:55:58Z) - Finding Alignments Between Interpretable Causal Variables and
Distributed Neural Representations [62.65877150123775]
Causal abstraction is a promising theoretical framework for explainable artificial intelligence.
Existing causal abstraction methods require a brute-force search over alignments between the high-level model and the low-level one.
We present distributed alignment search (DAS), which overcomes these limitations.
arXiv Detail & Related papers (2023-03-05T00:57:49Z) - Towards Robust and Adaptive Motion Forecasting: A Causal Representation
Perspective [72.55093886515824]
We introduce a causal formalism of motion forecasting, which casts the problem as a dynamic process with three groups of latent variables.
We devise a modular architecture that factorizes the representations of invariant mechanisms and style confounders to approximate a causal graph.
Experiment results on synthetic and real datasets show that our three proposed components significantly improve the robustness and reusability of the learned motion representations.
arXiv Detail & Related papers (2021-11-29T18:59:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.