Related papers: LLM Interpretability with Identifiable Temporal-Instantaneous Representation

LLM Interpretability with Identifiable Temporal-Instantaneous Representation

URL: http://arxiv.org/abs/2509.23323v1
Date: Sat, 27 Sep 2025 14:14:41 GMT
Title: LLM Interpretability with Identifiable Temporal-Instantaneous Representation
Authors: Xiangchen Song, Jiaqi Sun, Zijian Li, Yujia Zheng, Kun Zhang,
Abstract summary: We introduce an identifiable temporal causal representation learning framework specifically designed for Large Language Models.<n>Our approach provides theoretical guarantees and demonstrates efficacy on synthetic datasets scaled to match real-world complexity.
Score: 18.671694445771113
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite Large Language Models' remarkable capabilities, understanding their internal representations remains challenging. Mechanistic interpretability tools such as sparse autoencoders (SAEs) were developed to extract interpretable features from LLMs but lack temporal dependency modeling, instantaneous relation representation, and more importantly theoretical guarantees, undermining both the theoretical foundations and the practical confidence necessary for subsequent analyses. While causal representation learning (CRL) offers theoretically grounded approaches for uncovering latent concepts, existing methods cannot scale to LLMs' rich conceptual space due to inefficient computation. To bridge the gap, we introduce an identifiable temporal causal representation learning framework specifically designed for LLMs' high-dimensional concept space, capturing both time-delayed and instantaneous causal relations. Our approach provides theoretical guarantees and demonstrates efficacy on synthetic datasets scaled to match real-world complexity. By extending SAE techniques with our temporal causal framework, we successfully discover meaningful concept relationships in LLM activations. Our findings show that modeling both temporal and instantaneous conceptual relationships advances the interpretability of LLMs.

Related papers

Accordion-Thinking: Self-Regulated Step Summaries for Efficient and Readable LLM Reasoning [62.680551162054975]
We introduce an end-to-end framework where LLMs learn to self-regulate the granularity of the reasoning steps through dynamic summarization.<n>We apply reinforcement learning to incentivize this capability further, uncovering a critical insight: the accuracy gap between the highly efficient Fold mode and the exhaustive Unfold mode progressively narrows.<n>Our Accordion-Thinker demonstrates that with learned self-compression, LLMs can tackle complex reasoning tasks with minimal dependency token overhead.
arXiv Detail & Related papers (2026-02-03T08:34:20Z)
Concept Component Analysis: A Principled Approach for Concept Extraction in LLMs [51.378834857406325]
Mechanistic interpretability seeks to mitigate the issues through extracts from large language models.<n>Sparse autoencoders (SAEs) have emerged as a popular approach for extracting interpretable and monosemantic concepts.<n>We show that SAEs suffer from a fundamental theoretical ambiguity: the well-defined correspondence between LLM representations and human-interpretable concepts remains unclear.
arXiv Detail & Related papers (2026-01-28T09:27:05Z)
NeSTR: A Neuro-Symbolic Abductive Framework for Temporal Reasoning in Large Language Models [12.935644609836507]
Neuro-Symbolic Temporal Reasoning (NeSTR) is a novel framework that integrates structured symbolic representations with hybrid reflective reasoning.<n>NeSTR preserves explicit temporal relations through symbolic encoding, enforces logical consistency via verification, and corrects flawed inferences using abductive reflection.
arXiv Detail & Related papers (2025-12-08T06:58:23Z)
Temporal Referential Consistency: Do LLMs Favor Sequences Over Absolute Time References? [21.90468150326666]
Large language models (LLMs) are an alternative to knowledge sources.<n>LLMs must be factually accurate and demonstrate consistency across temporal dimensions.<n>Despite this critical requirement, efforts to ensure temporal consistency in LLMs remain scarce.
arXiv Detail & Related papers (2025-10-17T10:33:48Z)
Sound and Complete Neurosymbolic Reasoning with LLM-Grounded Interpretations [7.81820080453498]
Large language models (LLMs) have demonstrated impressive capabilities in natural language understanding and generation.<n>We present a method for directly integrating an LLM into the interpretation function of the formal semantics for a paraconsistent logic.
arXiv Detail & Related papers (2025-07-13T19:05:43Z)
Mapping the Minds of LLMs: A Graph-Based Analysis of Reasoning LLM [11.181783720439563]
Large Language Models (LLMs) display sophisticated reasoning abilities via extended Chain-of-Thought (CoT) generation.<n>RLMs often demonstrate counterintuitive and unstable behaviors, such as performance degradation under few-shot prompting.<n>We introduce a unified graph-based analytical framework for better modeling the reasoning processes of RLMs.
arXiv Detail & Related papers (2025-05-20T03:54:57Z)
The Curse of CoT: On the Limitations of Chain-of-Thought in In-Context Learning [56.574829311863446]
Chain-of-Thought (CoT) prompting has been widely recognized for its ability to enhance reasoning capabilities in large language models (LLMs)<n>We demonstrate that CoT and its reasoning variants consistently underperform direct answering across varying model scales and benchmark complexities.<n>Our analysis uncovers a fundamental hybrid mechanism of explicit-implicit reasoning driving CoT's performance in pattern-based ICL.
arXiv Detail & Related papers (2025-04-07T13:51:06Z)
Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning [53.685764040547625]
Transformer-based large language models (LLMs) have displayed remarkable creative prowess and emergence capabilities.<n>This work provides a fine mathematical analysis to show how transformers leverage the multi-concept semantics of words to enable powerful ICL and excellent out-of-distribution ICL abilities.
arXiv Detail & Related papers (2024-11-04T15:54:32Z)
Cognitive LLMs: Towards Integrating Cognitive Architectures and Large Language Models for Manufacturing Decision-making [51.737762570776006]
LLM-ACTR is a novel neuro-symbolic architecture that provides human-aligned and versatile decision-making. Our framework extracts and embeds knowledge of ACT-R's internal decision-making process as latent neural representations. Our experiments on novel Design for Manufacturing tasks show both improved task performance as well as improved grounded decision-making capability.
arXiv Detail & Related papers (2024-08-17T11:49:53Z)
Unveiling LLMs: The Evolution of Latent Representations in a Dynamic Knowledge Graph [15.129079475322637]
This work unveils the factual information an Large Language Models represents internally for sentence-level claim verification. We propose an end-to-end framework to decode factual knowledge embedded in token representations from a vector space to a set of ground predicates. Our framework employs activation patching, a vector-level technique that alters a token representation during inference, to extract encoded knowledge.
arXiv Detail & Related papers (2024-04-04T17:45:59Z)
Sparsity-Guided Holistic Explanation for LLMs with Interpretable Inference-Time Intervention [53.896974148579346]
Large Language Models (LLMs) have achieved unprecedented breakthroughs in various natural language processing domains. The enigmatic black-box'' nature of LLMs remains a significant challenge for interpretability, hampering transparent and accountable applications. We propose a novel methodology anchored in sparsity-guided techniques, aiming to provide a holistic interpretation of LLMs.
arXiv Detail & Related papers (2023-12-22T19:55:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.