Related papers: Priors in Time: Missing Inductive Biases for Language Model Interpretability

Priors in Time: Missing Inductive Biases for Language Model Interpretability

URL: http://arxiv.org/abs/2511.01836v2
Date: Sun, 09 Nov 2025 08:26:10 GMT
Title: Priors in Time: Missing Inductive Biases for Language Model Interpretability
Authors: Ekdeep Singh Lubana, Can Rager, Sai Sumedh R. Hindupur, Valerie Costa, Greta Tuckute, Oam Patel, Sonia Krishna Murthy, Thomas Fel, Daniel Wurgaft, Eric J. Bigelow, Johnny Lin, Demba Ba, Martin Wattenberg, Fernanda Viegas, Melanie Weber, Aaron Mueller,
Abstract summary: We show that Sparse Autoencoders impose priors that assume independence of concepts across time, implying stationarity.<n>We introduce a new interpretability objective -- Temporal Feature Analysis -- which possesses a temporal inductive bias to decompose representations at a given time into two parts.<n>Our results underscore the need for inductive biases that match the data in designing robust interpretability tools.
Score: 58.07412640266836
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recovering meaningful concepts from language model activations is a central aim of interpretability. While existing feature extraction methods aim to identify concepts that are independent directions, it is unclear if this assumption can capture the rich temporal structure of language. Specifically, via a Bayesian lens, we demonstrate that Sparse Autoencoders (SAEs) impose priors that assume independence of concepts across time, implying stationarity. Meanwhile, language model representations exhibit rich temporal dynamics, including systematic growth in conceptual dimensionality, context-dependent correlations, and pronounced non-stationarity, in direct conflict with the priors of SAEs. Taking inspiration from computational neuroscience, we introduce a new interpretability objective -- Temporal Feature Analysis -- which possesses a temporal inductive bias to decompose representations at a given time into two parts: a predictable component, which can be inferred from the context, and a residual component, which captures novel information unexplained by the context. Temporal Feature Analyzers correctly parse garden path sentences, identify event boundaries, and more broadly delineate abstract, slow-moving information from novel, fast-moving information, while existing SAEs show significant pitfalls in all the above tasks. Overall, our results underscore the need for inductive biases that match the data in designing robust interpretability tools.

Related papers

Temporal Concept Dynamics in Diffusion Models via Prompt-Conditioned Interventions [70.87254264798341]
PCI is a training-free and model-agnostic framework for analyzing concept dynamics through diffusion time.<n>It reveals diverse temporal behaviors across diffusion models, in which certain phases of the trajectory are more favorable to specific concepts even within the same concept type.
arXiv Detail & Related papers (2025-12-09T11:05:08Z)
When, How Long and How Much? Interpretable Neural Networks for Time Series Regression by Learning to Mask and Aggregate [16.533105886716804]
Time series extrinsic regression (TSER) refers to the task of predicting a continuous target variable from an input time series.<n>New approach learns a compact set of human-understandable concepts without requiring any annotations.
arXiv Detail & Related papers (2025-12-03T09:01:41Z)
Towards Transparent Stance Detection: A Zero-Shot Approach Using Implicit and Explicit Interpretability [12.794773087413256]
Zero-Shot Stance Detection (ZSSD) identifies the attitude of the post toward unseen targets.<n>IRIS considers stance detection as an information retrieval ranking task.<n>explicit rationales based on communicative features help decode the emotional and cognitive dimensions of stance.
arXiv Detail & Related papers (2025-11-05T16:54:10Z)
Towards Foundation Model on Temporal Knowledge Graph Reasoning [17.165969719351125]
Temporal Knowledge Graphs (TKGs) store temporal facts with quadruple formats (s, p, o, t)<n>New model employs sinusoidal positional encodings to capture fine-grained temporal patterns.<n>PostRA demonstrates strong zero-shot performance on unseen temporal knowledge graphs.
arXiv Detail & Related papers (2025-06-04T09:19:49Z)
VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models [27.280311932711847]
We present VITATECS, a diagnostic VIdeo-Text dAtaset for the evaluation of TEmporal Concept underStanding. We first introduce a fine-grained taxonomy of temporal concepts in natural language in order to diagnose the capability of VidLMs to comprehend different temporal aspects. We generate counterfactual video descriptions that differ from the original one only in the specified temporal aspect.
arXiv Detail & Related papers (2023-11-29T07:15:34Z)
An Overview Of Temporal Commonsense Reasoning and Acquisition [20.108317515225504]
Temporal commonsense reasoning refers to the ability to understand the typical temporal context of phrases, actions, and events. Recent research on the performance of large language models suggests that they often take shortcuts in their reasoning and fall prey to simple linguistic traps.
arXiv Detail & Related papers (2023-07-28T01:30:15Z)
Unlocking Temporal Question Answering for Large Language Models with Tailor-Made Reasoning Logic [84.59255070520673]
Large language models (LLMs) face a challenge when engaging in temporal reasoning. We propose TempLogic, a novel framework designed specifically for temporal question-answering tasks.
arXiv Detail & Related papers (2023-05-24T10:57:53Z)
A Threefold Review on Deep Semantic Segmentation: Efficiency-oriented, Temporal and Depth-aware design [77.34726150561087]
We conduct a survey on the most relevant and recent advances in Deep Semantic in the context of vision for autonomous vehicles. Our main objective is to provide a comprehensive discussion on the main methods, advantages, limitations, results and challenges faced from each perspective.
arXiv Detail & Related papers (2023-03-08T01:29:55Z)
Causal Reasoning Meets Visual Representation Learning: A Prospective Study [117.08431221482638]
Lack of interpretability, robustness, and out-of-distribution generalization are becoming the challenges of the existing visual models. Inspired by the strong inference ability of human-level agents, recent years have witnessed great effort in developing causal reasoning paradigms. This paper aims to provide a comprehensive overview of this emerging field, attract attention, encourage discussions, bring to the forefront the urgency of developing novel causal reasoning methods.
arXiv Detail & Related papers (2022-04-26T02:22:28Z)
Interpretable Time-series Representation Learning With Multi-Level Disentanglement [56.38489708031278]
Disentangle Time Series (DTS) is a novel disentanglement enhancement framework for sequential data. DTS generates hierarchical semantic concepts as the interpretable and disentangled representation of time-series. DTS achieves superior performance in downstream applications, with high interpretability of semantic concepts.
arXiv Detail & Related papers (2021-05-17T22:02:24Z)
Temporal Common Sense Acquisition with Minimal Supervision [77.8308414884754]
This work proposes a novel sequence modeling approach that exploits explicit and implicit mentions of temporal common sense. Our method is shown to give quality predictions of various dimensions of temporal common sense. It also produces representations of events for relevant tasks such as duration comparison, parent-child relations, event coreference and temporal QA.
arXiv Detail & Related papers (2020-05-08T22:20:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.